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PROGRAMMING    CONSIDERATIONS   FOR  PARALLEL   COMPUTERS 

by 

E.  DraughoH;,   R.  Grishman^   J.  Schwartz,   and   A.  Stein 
Courant  Institute  of  Mathematical  Sciences,  New  York  University 

1 .   Introduction 

The  present  article  reports  on  some  investigations, 
concerning  software  for  a  class  of  hypothetical  highly 
parallel  computers,  which  have  been  conducted  during  the 
past  year  at  the  Courant  Institute.   The  general  class  of 
computer  to  which  these  investigations  apply,  called  Athene 
class  computers  in  a  previous  article  [1],  is  characterized 
by  the  following  properties: 

(a)  several  processing  units  share  a  common  memory] 

(b)  each  processing  unit  has  its  own  instruction  location 
counter,  and  may  therefore  execute  unconditional  or  conditional 
transfers  independently  of  all  the  other  processors; 

(c)  processors  may  execute  code  sequences  located  in  their 
common  memory; 

(d)  the  processors  comprising  the  complete  system  operate 
asynchronously,  that  is,  in  no  close  time  relationship  to 
each  other; 

(e)  each  processor  may  execute  all  the  instructions  of  a 
reasonably  adequate  machine  instruction  set.   Each  processor 


has  its  own  upper  and  lower  memory  limit  registers ^  providing 
memory  protection  for  a  multiprogramming  operation  system. 
In  addition,  each  processor  may  execute  the  instructions 
belonging  to  a  small  set  of  special  "collective"  or 
"coordinating"  instructions  (cf.  below  for  additional  details). 

A  simulator  for  a  hypothetical  machine  of  this  sort  has 
been  written  to  run  on  the  CDC  6600  presently  at  the  Courant 
Institute.   This  simulator,  called  SOAPSUDS,  is  adapted  from 
an  autosimulator  developed  previously  for  the  CDC  66OO, 
It  simulates  up  to  60  processors,  each  with  the  instruction 
set  of  the  66OO,  sharing  a  common  memory.   In  addition,  the 
simulator  incorporates  various  coordinating  instructions 
(the  precise  set  included  has  varied  through  the  course  of 
our  experiments).   Finally,  various  aids  to  debugging  and 
simulated  performance  measurement  are  included  in  the  simulator. 

In  the  course  of  our  experimentation  with  and 
consideration  of  the  SOAPSUDS  parallel  computer  simulator  and 
associated  systems,  we  have  gathered  information  concerning 
both  the  theoretical  and  practical  aspects  of  writing  parallel 
programs. 

In  what  follows  we  shall  discuss  the  various  questions 
having  to  do  with  storage  usage,  program  flow  organization, 
system  efficiency,  methods  that  have  been  developed  to  handle 
parallel  programming  by  normal  Fortran  techniques,  and  a  new 
parallel  oriented  Fortran  (PFORTRAN)  incorporating  these  techniques 


A  first  appendix  will  give  additional  information 
concerning  the  PFORTRAN  compiler.   A  second  appendix  will 
describe  a  hardware  compiler-simulator  developed  to  make 
future  detailed  studies  of  logic  design  possible. 

2.   SOAPSUDS:   A  description  of  the  simulated  machine  and 
the  simulator. 

Soapsuds  is  a  program  written  for  the  CDC  6600  to 
simulate  a  computer  system  consisting  of  a  set  of  central 
processing  iinits^  each  with  its  own  instruction  location 
coiinterj  operating  asjoichronously-j  from  a  common  memory. 
Each  CPU  has  an  instruction  set  adequate  for  a  general 
purpose  computer,  and  can  execute  iinconditional  and  conditional 
branches  independently  of  the  other  processors.   The  large 
central  memory  is  also  divided  into  independent  modules,  the 
processors  and  memory  modules  interconnected  in  a  "central 
exchange. " 

The  individual  processors  of  the  simulated  system  are 
essentially  the  same  as  the  6600  central  processor,  with  a 
60-bit  word  length  and  24  operating  registers.   This,  of 
course,  makes  simulator  coding  and  usage  much  easier.   In 
addition,  the  large  word  size  and  register-oriented  instruction 
set  reduce   the  rate  of  memory  references,  making  this  processor 
suitable  for  a  system  in  which  memory  speed  and  memory-processor 
communication  may  be  major  factors  in  overall  speed. 


Several  instructions  have  been  added  to  the  normal 
instruction  set  of  the  66OO  to  meet  the  needs  encountered  in 
a  parallel  processing  environment.   Since  the  processors  are 
identical  and  operate  from  a  common  memory^  some  simple 
device  enabling  a  processor  to  tell  "which  one  it  is"  appears 
necessary.   A  unique  identifying  number  for  each  processor 
in  the  system  is  needed,  for  example,  within  the  operating 
system  or  other  code  executed  by  all  processors,  in  which 
code  tables  are  partitioned  for  use  by  the  individual 
processors.   In  particular,  a  hardware  identification  is 
necesssry  if  one  processor  is  able  to  interrupt  other  processors, 
since  the  specification  of  which  processors  are  to  be  interrupted 
must  ultimately  be  translated  into  hardware  terms.   For  this 
reason  we  have  included  the  instruction  READ  BADGENUMBER  in 
all  our  studies  of  parallel  processor  simulation.   This 
instruction  puts  the  n\imber  (which  lies  between  0  and  n-1  for 
a  set  of  n  processors)  of  the  processor  executing  the  instruc- 
tion into  an  index  register.   Suggestions  have  been  made  for 
an  "assigned  badgenumber"  scheme,  which,  in  contrast  to  the 
above  (absolute)  badgenumber,  assigns  each  processor  executing 
a  particular  job  a  unique  identifying  number,  starting  with  zero. 
It  seems  satisfactory  at  the  moment,  however,  to  carry 
identifiers  of  this  latter  type  to  an  ordinary  index  register. 

Three  instructions  which  facilitate  coordination  of 
processor  activity  have  been  studied.   OR,  ZERO,  AND,  SKIP, 


OR  TO  STORAGE,  and  REPLACE  ADD.   The  OR,  ZERO,  AND  SKIP  takes 
the  logical  sum  of  the  designated  X  registers  of  all  processors 
executing  the  instmction  simultaneously  and  leaves  the 
result  in  all  these  registers,  clears  the  present  instruction 
word  to  a  no-op,  and  skips  to  the  present  instruction  location+2. 
The  OR  TO  STORAGE  forms  the  logical  siun  of  an  X  register  and  a 
memory  location,  leaving  the  result  in  memory.   The  REPLACE  ADD 
computes  the  integer  sum  of  a  memory  location  and  X  register, 
leaving  the  result  both  in  memory  and  in  the  register; 
furthermore,  no  other  processor  may  reference  the  addressed 
location  between  the  times  one  processor  accesses  the  location 
and  the  time  at  which  the  sum  is  stored. 

The  OR,  ZERO,  AND  SKIP  (OZS)  was  used,  for  example,  in 
programming  a  fork  procedure;  the  necessary  code  constituted 
about  20  instructions.   A  typical  application  of  the 
OR  TO  STORAGE  is  the  assemble  or  join  operation:   each  processor 
is  assigned  a  bit  in  a  word,  and  when  all  bits  have  been  set, 
all  CPUs  have  been  assembled.   Because  two  processors  may  OR 
their  bits  in  before  either  has  an  opportunity  to  read  the  word, 
however,  there  is  some  difficulty  in  selecting  a  "last  processor" 
to  execute  subsequent  code.   Various  difficulties  and  inconveni- 
ences of  this  type  caused  the  abandonment  of  both  OZS  and  ORS, 
and  their  replacement  with  the  more  useful  "replace  add" 
instruction. 


The  REPLACE  ADD  instruction  is  now  used  to  implement  all 
processor-coordinating  operations.   It  permits  very  efficient 
fork  and  join,  lock  and  unlock  operations.   At  the  same  time 
it  provides,  in  a  single  instruction,  a  basic  system  operation 
—  the  updating  of  pointers  --  which  would  otherwise  require 
a  lock,  update,  unlock  sequence  to  prevent  two  processors 
from  accessing  a  pointer  simultaneously.   The  REPLACE  ADD 
overcomes  the  timing  difficulties  inherent  in  the  OZS  by  being 
storage  oriented,  and  other  difficulties  inherent  in  the 
OR  TO  STORAGE  by  leaving  its  result  directly  in  a  register. 
In  hardware  terms,  the  instruction  would  probably  require  a 
memory  module  lock  out  to  prevent  further  references  between 
the  time  the  original  contents  is  issued  to  the  CPU  and  the 
time  the  sum  is  returned. 

Two  additional  instructions,  EXIT  and  EXCHANGE  EXIT, 
have  been  added  as  being  convenient  for  a  multiprogramming 
operating  system.   These  instructions  assume  that  each  CPU 
has  its  own  status  flip-flop,  and  that  depending  on  the  setting 
of  this  flip-flop  the  CPU  can  be  in  either  "program  mode"  or 
"resident  mode."   The  EXIT  instmiction  toggles  this  flip-flop, 
and  at  the  same  time  jumps  and  resets  memory  limits  of  the 
processor.   When  EXIT  is  executed  by  a  processor  in  program 
mode,  the  transfer  address  and  memory  limits  which  result  are 
fixed;  they  become  those  of  the  operating  system  resident 
program.   When  EXIT  is  executed  in  resident  mode,  the  resulting 


transfer  address  and  memory  boiinds  --  those  of  the  next  job  to 
be  executed  --  may  be  specified  in  the  EXIT  instruction.   Using 
this  instruction  a  CPU  can  be  prevented  from  a  program  accident- 
ally interfering  in  another  job  or  with  the  system  resident. 

Using  an  EXCHANGE  EXIT  instruction,  a  processor  in  resident 
mode  can  interrupt  any  other  processor  and  transfer  it  to  another 
task  at  the  same  time;  all  the  registers  of  the  interrupted 
processor  are  saved  and  new  ones  loaded  from  memory.   Using  this 
instruction,  one  GPU  can  obtain  the  assistance  of  as  many  others 
as  are  needed  to  service  a  real  time  interrupt,  for  example. 
When  real  time  requirements  of  this  kind  have  been  satisfied, 
the  original  registers  can  be  restored  and  the  job  on  which  the 
interrupted  processors  were  working  can  be  continued. 

The  SOAPSUDS  simulator  is  an  executing  simulator  for  the 
computer  system  as  described  above,  and  allows  up  to  60 
simulated  processors.   The  simulator  processes  one  instruction 
per  GPU  in  each  of  its  major  phases,  cycling  through  the 
processors.   The  simulator  is  equipped  with  extensive  tracing, 
trapping  and  checking  features  and  with  various  options  for  the 
timing  of  routines  and  the  gathering  of  useful  statistics. 

The  present  simulator  includes  the  READ  BADGENUMBER, 
REPLACE  ADD,  and  EXIT  instructions.   In  addition,  recent  versions 
have  included  a  simulated  "hardware"  implementation  of  private 
storage  (cf.  the  next  section).   Storage  of  this  type  has  been 
simply  and  efficiently  implemented  in  SOAPSUDS  using  an 
indirect  addressing  scheme. 
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3.  FORTRAN-orlented  parallel  programming  methods.  Storage  allocation. 

The  availability  of  a  replace-add  instruction,  and  the  use 
of  the  conventions  concerning  variable  naming  and  memory  alloca- 
tion to  be  described  in  the  present  section,  makes  the  FORTRAN 
programming  of  parallel  programs  easy. 

We  distinguish  at  the  FORTRAN  level  between  two  types  of 
variables  PUBLIC  and  PRIVATE  (as  will  be  seen,  this  classification 
cuts  across  the  normal  FORTRAN  REAL,  INTEGER,  LOGICAL,  etc.  types). 

Variables  declared  to  be  PRIVATE  are  taken  to  be  private  to 
each  processor  executing  the  FORTRAN  program.   That  is,  it  is 
assumed  either  that 

(a)  such  variables  are  stored  in  registers  private  to  each 
processor  but  not  in  memory,  or 

(especially  if,  as  is  usual,  the  niimber  of  registers  per 
processor  is  small) 

(b)  the  machine  code  compiled  from  the  assumed  FORTRAN  source 
is  such  that  every  processor  loads  each  such  variable  from  and 
stores  each  such  variable  into  a  memory  cell  private  to  the 
processor. 

Implicit  'temporary  storage'  locations  created  by  the 
FORTRAN  compiler  are  assxamed  to  be  private  to  each  processor, 
in  the  sense  explained  above. 

Variables  and  arrays  declared   PUBLIC  on  the  other  hand 
are  taken  to  be  common  to  all  processors,  so  that  values  of 
such  variables  set  by  any  processor  will  subsequently  be  used 


by  any  other  processor^  etc. 

Typically  input  data,  output  data  and  intermediate 
results  that  are  needed  at  some  later  time  are  placed  in 
PUBLIC  storage,  while  loopcounters  and  temporary  intermediate 
data  are  kept  in  PRIVATE  storage. 

In  addition  to  the  two  main  types  of  storage  described 
above  and  directly  implemented  in  the  PFORTRAN  compiler  and 
the  SOAPSUDS  simulator,  two  other  secondary  storage  types  may 
be  distinguished  by  their  usage.   These  secondary  storage  types 
have  no  independent  logical  existence  in  either  the  compiler 
or  the  simulator  however;  they  are  programmed  in  terms  of 
PUBLIC  and  PRIVATE  storage  using  the  replace-add  instruction 
in  ways  that  will  be  explained  in  subsequent  sections  of  the 
present  report.   These  two  secondary  forms  of  storage  are 
(i)    Intermediate  storage,  which  is  storage  from  portions  of 
which  all  other  CPUs  will  be  temporarily  excluded  by  some 
particular  GPU.   Intermediate  storage  is  typically  used  for 
incomplete  partial  results,  for  the  updating  of  arrays,  etc. 
(ii)   Control  storage  is  used  for  communication  between  the 
CPUs;  it  is  PUBLIC  except  during  modification  by  one  of  the 
CPUs.   Such  modification  is  generally  of  such  short  duration 
as  not  to  warrant  the  use  of  machinery  required  for  the 
implementation  of  storage.   Control  storage  is  currently 
implemented  very  directly  in  terms  of  the  replace-add 
instruction. 


k.  FORTRAN  oriented  parallel  programming  methods. 

The  basic  NEWVAL  function. 

The  'replace  add'  machine  instruction  may  be  reached 
conveniently  from  FORTRAN  source  programs  by  introducing  a 
single  FORTRAN  integer  function  NETrfVAL(l_, j)  .   This  function 
has  the  value  I+J.   Moreover^  each  time  it  is  called,  it 
changes  the  value  of  the  variable  1  to  I  + J.   If  several 
processors  call  this  function  simultaneously,  the  effect  is 
the  same  as  if  these  processors  called  the  function  in  some 
serial  order. 

Using  this  function  and  the  storage  conventions  explained 
above  has  the  following  consequences: 

(a)    FORTRAN  statement  sequences  may  be  executed  by  several 
processors  at  once  without  unanticipated  interference. 
Consider,  e.g.,  a  typical  FORTRAN  sequence  such  as 
Bgi  1   J  =   1,10 
1     A(I,J)  =  A(I,J)  +  B(I,J)  *    0(1, J) 
Assume  for  the  sake  of  discussion  that  the  arrays  A,  B,  0 
have  been  declared  to  be  PUBLIC,  but  that  the  integer 
variables  I  and  J  have  been  declared  PRIVATE.   Then 
(1)    During  execution  of  the  DO  loop  each  processor  has  its 
own  'private'  value  of  the  DO  coimter  J;  these  values  are 
separately  incremented.   Each  processor  may  also  be  executing 
the  loop  with  its  own  individual  value  of  I. 
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(2)    However,  the  arrays  A,  B,  G  referenced  by  all  processors 
are  the  same.   Since  I  and  J  are  particular  to  a  given 
processor,  it  may  nevertheless  be  true  that  the  processors 
are  always  referencing  non-overlapping  sets  of  array  elements, 
(b)    The  basic  function  NEVTVAL  can  be  used  to  express  various 
of  the  'FORK'  and  'JOIN'  instructions  previously  proposed  as 
fundamental  terms  for  parallel  coding  (cf.  [2]  for  such  a 
proposal).   To  have  one  processor  branch  to  label  1  while  all 
others  continue  in  sequence  one  uses  a  COMMON  variable  I 
initialized  to  -1  and  the  statement 

IF(NEWVAL(I,1))  G^  T^  1 
To  cause  all  processors  to  go  to  label  1  when  the  statement  2 
has  been  executed  M  times  or  more  (JOIN  instruction,  cf.  [2]) 
the  statement 

2     IF(NE¥VAL(I,1)  .GT.  M)  G0   T^  1 
may  be  used. 

A  PUBLIC  variable  I  may  be  reserved  for  temporary  use 
by  a  single  processor  as  follows.   With  I,  associate  an 
additional  PUBLIC  variable,  called,  e.g.,  ILOCK,  which  when 
I  is  not  in  use,  is  set  to  -1.   The  appropriate  code  sequence 
is  then 

100     IF(NE¥VAL(IL^CK,1))  G?^  Tp^  100 

[Here  follows  code  making  exclusive 
use  of  I] 
IL$2^CK  =  -1 

The  last  statement  releases  I  for  use  by  a  processor  waiting  at, 
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or  subsequently  entering^  the  'reservation  statement'  100. 

(c)    Parallel  DO  loops^  that  is,  loops  enclosing  code  which 

is  to  be  executed  a  fixed  number  N  of  times,  may  be  written 

as  follows.   (In  the  code  sequence  below,  we  assume  that  the 

DO  loop  increment  is  1,  that  I  and  LIM  are  PUBLIC  variables, 

and  that  I  is  set  to  1  less  than  the  desired  loop  initiation 

value  when  the  loop  is  entered,  and  that  II  is  a  private 

variable.   Of  course,  these  assumptions  are  readily  relaxed.) 

300    II  =  NEWVAL(I,1) 

IF(II  .GT.  LIM)  G^  T^  200 

[Body   of   loop;      II   occurs   as   a  variable] 
G0  Tgl  500 

200  G^TINUE 

5.     FORTRAN  oriented  parallel  programming  methods. 
Verbs  added  to  the  PFORTRAN  compiler. 

While  all  necessary  code  for  the  coordination  of  parallel 
programs  can  be  written  directly  in  terms  of  the  bas.ic  NEWVAL 
function,  our  programming  experiments  indicated  the  utility  of 
providing  a  more  convenient  and  flexible  method  for  writing 
parallel  programs.   In  order  to  experiment  with  alternative 
parallel  oriented  compiler  verbs,  a  Fortran-like  compiler 
PFORTRAN  was  written,  the  current  specifications  of  which  follow, 
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(i)    The  "basic  features  of  Fortran  IV  are  all  available, 
(ii)   Storage  types. 

The  compiler  generates  code  for  two  storage  types: 
PRIVATE  and  PUBLIC.   Private  storage  is  implicitly  indexed  by 
a  processor  badgenumber,  and  therefore  provides  each  operating 
CPU  with  its  own  individual  value  of  any  particular  PRIVATE 
variable.   PRIVATE  storage  is  implemented  by  reserving  a 
unique  area  for  each  given  private  variable  and  treating  all 
references  to  that  variable  as  a  reference  to  the  head  of  the 
area  plus  the  CPU  number. 

Public  storage  is  handled  in  the  more  familiar  manner 
of  ordinary  FORTRAN, 
(iii)   Looping. 

It  is  useful  to  define  two  new  forms  of  parallel  DO 
statements  and  to  note  a  modification  in  the  meaning  of  the 
ordinary  DO  statement. 

Our  first  new  'parallel  DO'  statement  form  is 

(a)  D^P  S(A)  J  =  IN,  FIN,  ST 

Here  S  and  A  are  labels.   DOP  starts  the  CPUs  executing  it 
with  consecutive  initial  loop-count  values  (i.e.  J=IN,IN+ST. . . ) . 
Any  CPU  whose  assigned  initial  value  J  exceeds  INT  is  shunted  off 
to  statement  A;  one  CPU  is  allowed  to  fall  through  statement  S. 
The  second  new  'parallel  DO'  statement  form  is 

(b)  D^P  S  J  =  IN,  FIN,  ST 

This  form  of  DOP  operates  exactly  like  the  other  except  that 


Every  CPU  is  given  a  unique  number;  this  identifier  is 
generally  held  in  internal  index  register  AO. 
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when  the  assigned  initial  value  J  of  a  CPU  exceeds  FIN^  control 
of  any  CPU  whose  assigned  initial  value  J  exceeds  FIN  is 
transferred  to  the  first  statement  past  the  loop. 

Note  that  execution  of  a  statement  of  either  of  these 
forms  sets  up  a  counter  at  the  head  of  the  loop.   The  counter 
is  decremented  at  the  close,  in  order  to  allow  the  proper 
termination  of  the  loop. 

Note  also  that  when  an  ordinary  DO  is  used: 

(a)  if  the  DO  variable  is  PUBLIC,  care  must  be  taken  in 
order  that  only  one  GPU  executes  the  loop  at  one  time. 

(b)  if  the  DO  variable  is  PRIVATE,  the  loop  may  be  executed 
simultaneously  and  independently  by  any  number  of  CPUs.   Such 
execution  causes  each  of  the  CPUs  to  execute  the  loop  with  the 
same  range  of  values  given  to  the  variable.   As  the  loops  are 
not  coordinated,  explicit  coordination  must  be  implemented 
elsewhere  in  the  program  as  necessary. 

(iv)   Parallel  execution  control  statements. 

The  PFORTRAN  language  includes  several  verbs  which  enable 
the  user  to  coordinate  the  parallel  execution  of  portions  of  code 
in  a  convenient  manner.   These  statements  are  LOCK,  UNLOCK, 
LOCK(K),  UNLOCK(K)  and  PAR  N,S. 

LOCK  allows  only  one  GPU  to  enter  the  code  which  follows  it. 
UNLOCK  opens  the  lock  appearing  most  recently  in  the  program. 
Unlabeled  LOCK  and  UNLOCK  statements  are  thus  required  to  form 
a  nested  set. 
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LOCK(K)  operates  much  as  the  simple  form  of  the  LOCK  statement. 

Since  a  parameter  is  explicitly  provided^  however,  'unlocking' 

from  a  remote  point  of  the  program  is  possible. 

IINLOCK(K)  has  a  self-evident  meaning. 

PAR  NjS  allows  not  more  than  N  CPUs  to  enter  the  following 

program;  if  more  than  N  processors  execute  the  PAR  statement, 

control  of  the  N+l-st  and  N+2-nd,  etc.,  processors  is 

transferred  to  statement  S. 

UNLOCK  S  resets  the  PAR  filter  appearing  as  statement  S. 

(v)    The  basic  NEWVAL  function,  whose  meaning   is  explained 

in  the  preceding  section,  is  provided  as  a  built-in  function 

of  the  PFORTRAN  language. 

(vi)   Two  system  verbs  are  provided  to  enable  PFORTRAN  user 

programs  to  communicate  with  the  simulated  operating  system 

described  in  the  following  section.   Within  this  simulated 

operating  system,  compiled  programs  will  be  operated  in  a 

time-sharing  environment.   They  will  have  the  ability  to  request 

CPUs  from  and  to  return  CPUs  to  the  system.   The  two  verbs  that 

facilitate  this  user-system  communication  are  REQUEST  and  RELEASE. 

REQUEST  has  the  form 

REQUEST  N^,  ST-^,  Ng,  ST^,  ... 

where  the  N.  are  integers  and  the  ST.  are  statement  numbers.  When 
this  verb  is  executed,  the  system  is  requested  to  provide  N-,  CPUs 
to  begin  execution  at  ST-j^j  Np  to  begin  execution  at  ST^^,  etc. 
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The  RELEASE  statement  has  the  form 
RELEASE 
Any  GPU  executing  this  verb  will  return  to  the  operation  system, 

6.    The  simulated  ATHENE  operating  system.   Outline. 

The  purpose  of  that  portion  of  the  operating  system  which 
has  been  written  so  far  is  (1)  to  re-distribute  processors^  as 
they  become  available,  to  various  tasks  attempting  to  run  in 
the  simulated  operating  environment;   (2)   conversely  to  handle 
the  assignment  of  tasks  to  processors;  and   (5)   to  accept  and 
manage  the  I/O  for  all  processors. 

Executive  control  resides  temporarily  with  any  CPU 
executing  a  portion  of  the  resident  program  itself,  rather  than 
permanently  or  semi-permanently  with  a  particular  CPU.   Any 
number  of  CPUs  may  be  executing  portions  of  the  resident  program 
at  the  same  time;  little  or  no  waiting  for  other  CPUs  to  termin- 
ate will  occur. 

The  tables  used  are  a  'joblist' ,  and  a  'tasklist'  for 
each  job  in  the  joblist  (see  Figure  l).   The  'joblist'  contains 
a  slot  for  each  job  currently  in  the  machine,  and  each  slot 
contains  a  subfield  giving  the  total  number  of  tasks  currently 
available  for  that  job.   As  tasks  are  added  and  removed,  these 
totals  are  incremented  and  decremented  via  the  RAD  instruction 
which  avoids  unnecessary  waiting  almost  completely.   Each 
'tasklist'  describes  all  the  unassigned  parallel  tasks  for  a 
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given  job  on  the  joblist,  and  contains  an  address  for  each  task 
currently  available  for  execution.   When  a  task  is  picked  up 
by  a  GPUj  it  is  removed  from  its  tasklist.   Additions  to 
tasklists  are  caused  by  processors  assigned  to  a  given  job 
executing  REQUEST  verbs.   There  are  separate  pointers  for 
each  of  these  tasklists,  indicating  a  next  task  addition  and 
a  next  task  removal  address.  Both  addition  and  removal  can  be 
done  concurrently  with  no  waiting  time. 

The  method  employed  may  be  described  as  follows. 
(See  flow  chart.  Figure  2.)   (l)   Each  processor,  on  executing 
a  RELEASE,  checks  that  its  exit  from  a  job  has  not  reduced  the 
number  of  processors  for  that  job  to  below  the  minimum  assigned 
by  the  user.   If  this  is  not  the  case  the  processor  finds  the 
joblist  slot  of  the  highest  priority  job  and  decrements  the  total 
count  of  the  number  of  tasks  available  within  this  job  by  1, 
using  the  RAD  machine  instruction.   The  CPU  then  picks  up  the 
remove  pointer  for  the  appropriate  task  list  via  the  RAD  and 
EXITS  to  that  task.   Note  that  even  if  two  or  more  CPUs  execute 
this  section  of  the  executive  code  simultaneously,  no  waiting 
time  will  be  forced,  and  each  CPU  will  get  a  unique  task  from 
the  list.   The  tasklist  for  each  job  is  circular  and  is 
(arbitrarily)  100  octal  words  long.   The  tasklist  pointer, 
after  being  incremented  by  the  RAD  instruction,  is  masked  with 
a  77j  thus  obtaining  circularity  without  any  delay  for 
resetting  the  pointer  every  time  it  reaches  the  end  of  the  list. 

If  no  limited  CPU-number  job  requires  any  additional 
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processors^  a  processor  entering  the  system  may  either 

execute  the  system  job  rollin-rollout  code^  or  it  may 

go  to  a  task  that  can  accept  an  indefinite  number  of  CPUs. 

In  real-time  emergencies,  or  for  efficiency,  or  when  a 
Job  aborts,  it  may  be  necessary  for  a  processor  to  steal  other 
processors  from  a  job  via  the  forced  EXCHANGE  EXIT  machine 
instruction.   Aside  from  housekeeping,  there  are  no  differences 
between  tasks  to  be  entered  with  an  EXJ  and  those  entered  with 
a  normal  XIT. 

(2)  To  execute  the  REQUEST  verb,  a  processor  EXITs  to  the 
resident  code,  and  increments  its  jobs  tasklist  add  pointer, 
using  the  RAD  instruction,  a  number  of  times  equal  to  the 
number  of  tasks  it  is  going  to  add  to  the  tasklist.   It  then 
stores  the  addresses  of  the  tasks  on  the  tasklist  list. 
Other  additions  to  the  list  can  be  going  on  concurrently  and 
no  waiting  time  is  forced. 

(3)  The  user  assigns  a  run-invariant  output  grouping  control 
type  to  his  job,  to  control  the  arrangement  of  his  output. 
Output  may  be  grouped  by  processor,  by  task,  or  chronologically, 
the  necessary  arrangement  of  output  files  being  accomplished  by 
the  system  code. 

All  output  is  transferred  unconverted  to  one  of  two  dual 
buffers.   When  a  processor  finds  a  buffer  full,  it  begins 
converting  its  contents  and  sending  them  to  the  system  I/O 
control  processors  for  physical  output.   Thus,  only  one 
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processor  is  required  for  output  conversion  and  I/O  processing. 
Other  job-attached  CPUs  are  freed  immediately  for  further 
computation.   The  likelihood  of  waiting  forced  during  the 
dumping  of  buffers  is  thereby  reduced  relative  to  what  would 
have  been  expected  if  separate  buffers  for  each  job  or  even 
for  each  task  or  processor  had  been  maintained. 

7.     The  simulated  ATHENE  operating  system.   General 
discussion  of  operating  system  problems. 

Any  operating  system  for  Athene-type  parallel  machines 
must  solve  the  following  problems: 

(a)  how  to  route   the  CPUs  between  several  jobs  in 
a  minimum  of  time. 

(b)  how  to  establish  a  job  mix  that  keeps  all  CPUs 
occupied. 

(c)  how  to  define  and  implement  priority. 

(d)  how  to  optimize  throughput. 

First,  as  to  throughput.   It  seems  advisable  to  guarantee 
each  job  a  minimiom  niimber  of  CPUs,  to  let  the  job  itself 
specify  what  this  minimum  shall  be,  and  to  allow  this  minim-um 
to  be  changed  dynamically. 

In  the  present  system,  the  minimum  may  be  changed  each 
time  a  request  for  processors  is  made.   The  minimum  is  checked 
each  time  a  processor  is  returned  to  the  system.   If  the  minimiim 
is  too  high  relative  to  the  other  jobs,  the  job  is  rolled  out. 
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Secondj  as  to  useful  techniques  for  GPU  routing. 
Two  methods  have  been  studied  and  coded.   Method  1:   maintain  a 
queue  of  tasks  for  each  job  in  the  machine.   A  pointer  to  this 
list  (both  the  queue  and  the  pointer  are  public  variables)  is 
picked  up  via  the  RAD  instruction  by  the  individual  processors 
acting  independently.   Each  processor  is  thus  assured  a  unique 
task  from  the  list.   In  adding  tasks  to  this  list  (simultane- 
ously of  course) J  a  processor  uses  another  pointer  which  it 
updates  by  an  amount  equal  to  the  number  of  tasks  being  added 
(using  the  RAD  instruction) .   The  list  is  circular  and  overflow 
is  detected  by  noticing  whether  the  upcoming  slot  is  zero  or  not. 
In  addition^  an  index  of  jobs  is  kept  (and  ordered  by  priority). 
On  this  index  one  maintains  the  total  niimber  of  tasks  available 
on  the  corresponding  tasklist.   Maintenance  of  this  total  without 
waits  or  lockouts  is  accomplished  by  the  RAD  instruction. 

Method  1,    as  described  so  far^  involves  no  waiting. 
In  an  earlier  version,  it  was  necessary  for  the  processors  to 
wait  whenever  the  pointers  were  reset  to  the  top  of  the  tasklist 
list  so  that  the  list  could  be  used  circularly.   In  the  present 
version,  however,  after  the  execution  of  a  RAD  instruction, 
the  right-most  six  bits  are  masked  out  (since  the  list  is  100 
octal  words  long)  and  these  six  bits  provide  the  pointer.   The 
circularity  is  thus  automatic. 

When  priorities  are  changed,  however,  or  when  new  jobs 
are  rolled  in,  the  master  index  must  be  re-ordered  by  priority. 


■22- 


and  this  involves  waiting  time. 

To  deal  with  this  problem  a  second  method  was  devised 
(Method  2).   In  this  method,  separate  lists  are  maintained  for 
each  priority  level.   The  lists  operate  as  in  method  1,  except 
that  tasks  are  added  to  the  list  corresponding  to  the  job's 
current  priority.   It  is  thus  easy  to  change  priorities  and 
there  is  no  necessity  ever  to  re-order  a  list. 

The  difficulty  with  this  method  is  that  when  a  job  aborts 
or  is  rolled  out  temporarily,  it  is  especially  tedious  to  search 
for  and  remove  all  the  outstanding  tasks  from  the  various  lists 
on  which  they  may  appear. 

A  third  general  desideratum  is  keeping  the  machine  busy. 
Some  algorithms  can  use  a  large  variable  niimber  of  CPUs,  so 
that  the  number  which  may  usefully  be  assigned  to  such  a  job 
is  for  practical  purposes  unlimited.   If  there  is  at  least  one 
job  in  the  machine  with  such  an  unlimited-processor  task,  then 
all  the  CPUs  will  be  busy.   It  is  thus  important  to  know  which 
jobs  have  such  tasks  available  and  when  they  will  be  available. 
In  this  connection  it  is  important  to  know  how  plentifully  such 
algorithms  will  occur  in  a  normal  installation  job-stream. 

A  number  of  problems  are  common  to  both  of  these  methods. 

As  the  niunber  of  jobs  and  the  number  of  CPUs  increases, 
the  housekeeping  problems  involved  in  storage  and  retrieval  of 
the  exchange  packages  increases.   On  the  other  hand,  it  is  not 
clear  that  EXCHANGE  EXITs  are  needed  except  in  cases  of  aborts 


■23- 


or  roll-outs.   (No  EXCHANGE  EXITs  have  been  used  so  far  in  the 
present  systems.)   Priority  considerations,  however,  may  require 
that  a  processor  should  steal  other  CPUs  via  the  EXCHANGE  EXIT 
instruction.   This  may  be  done  when  the  queue  for  a  high- 
priority  job  becomes  too  long,  or  when  a  low-priority  job  has 
accumulated  more  CPUs  than  it  should  have,  given  its  priority. 
A  difficulty  arises  in  this  connection  in  the  use  of  unlimited- 
processor  tasks.   If  these  tasks  are  long  ones,  it  may  be  more 
appropriate  for  some  other  task  to  obtain  the  processors  so 
that  reassignment  would  be  indicated.   However,  it  is  possible 
that  the  same  processor  may  work  on  more  than  one  task  within 
the  same  job.   If  this  processor  is  interrupted  overf requently, 
then  the  number  of  exchange  packages  per  job  may  exceed  the 
number  of  CPUs  in  the  machine.   Yet,  priority  considerations  do 
not  seem  to  allow  one  to  require  that  a  processor  come  back  to 
take  up  the  interrupted  task  before  doing  anything  else. 

No  provision  has  yet  been  made  in  this  system  for 
cancelling  tasks,  though  of  course  it  is  desirable  for  the  user 
to  be  able  to  request  the  exeuction  of  a  task  and  later  withdraw 
it,  either  before  it  has  begun  execution  or  afterward. 

If  a  group  niomber  register  were  included  in  the  machine 
hardware,  it  would  be  possible  for  a  number  of  processors  to 
be  re-apportioned  simultaneously.   Suppose  all  the  processors 
on  one  job  or  task  are  assigned  the  same  group  number.   A 
system  CPU  could  join  that  group  and  jump  all  processors  of  that 
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group  to  a  common  address  (this  jump  would  be  like  a  return 
jump).   At  that  address^  the  processors^  using  the  RAD 
instruction^  could  split  themselves  into  various  streams ,  so 
that  all  but  a  certain  number  could  return  to  where  they  were; 
the  rest  could  return  to  the  system  to  find  out  what  tasks 
they  are  scheduled  to  perform. 

A  niomber  of  questions  arise  in  connection  with  output 
in  a  parallel  environment: 

(a)  should  there  be  standard  alternative  ways  of  organizing 
the  output  from  a  given  job; 

(b)  should  outputting  (printing)  from  public  variables  be 
allowed; 

(c)  should  the  data  be  converted  before  it  is  put  into  the 
output  buffer  by  the  individual  processors^  or  afterward; 

(d)  should  the  type  of  organization  of  the  output  be  run- 
invariant  or  be  changeable  by  the  programmer; 

(e)  should  non-standard  types  of  organization  be  possible 
to  the  programmer? 

In  the  present  system^  there  are  three  standard  types 
of  organization  of  the  output:   (l)  by  processor^  (2)  by  task, 
(3)  by  chronological  order.   The  type  of  organization  selected 
by  the  user  is  run- invariant . 

Instead  of  having  separate  output  buffers  for  each 
processor  or  for  each  task,  it  was  decided  to  have  one  job 
master  buffer.   Some  of  the  reasons  for  this  are  I   a  master 
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buffer  would  waste  less  space.   Moreover,  providing  an  output 
buffer  for  each  task  within  a  job  would  create  substantial 
delays  while  the  various  buffers  were  being  dumped.   More 
than  one  CPU  would  have  to  wait  for  l/O  controller  to 
become  available  to  do  the  physical  output. 

No  provisions  for  non-standard  types  of  organization 
have  been  made  in  the  present  system. 

Output  from  public  variables  seems  somewhat  problematic. 
As  far  as  FORTRAN  is  concerned,  it  is  not  possible  to  indicate 
to  the  programmer  that  once  one  processor  begins  to  execute  a 
PRINT  statement,  it  may  not  be  safe  to  use  the  public  variables 
in  the  FORTRAN  l/O  list  until  the  entire  PRINT  statement  has 
been  completed.   Such  a  convention  would  result  in  unnecessary 
waiting  time  (especially  if  the  data  is  converted  at  that  time). 
Yet  since  it  is  sometimes  necessary  to  print  out  public  arrays, 
it  seems  best  to  leave  the  decision  explicitly  to  the  programmer. 

8.     Effective  organization  of  parallel  program  flow. 

In  order  to  make  effective  use  of  available  computer  power, 
programs  that  are  to  be  'parallelized'  should  be  set  up  so  that 
only  a  minimum  of  time  is  lost  waiting  for  intermediate  results. 
If  waiting  processors  are  to  be  RELEASEd  and  then  reREQUESTed, 
the  waiting  periods  should  consist  as  much  as  possible  of  a  small 
number  of  long  gaps  rather  than  a  large  number  of  short  ones,  to 
minimize  the  costs  inherent  in  switching  CPUs  from  job  to  job. 
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To  accomplish  these  goals,  it  is  useful  to  do  a 
preliminary  analysis  of  the  Pert  type  (either  by  hand  or 
computationally)  of  a  program  to  be  made  parallel.   The  results 
of  such  an  analysis  would  describe  the  following  aspects  of  the 
overall  program  structure: 

(1)  dependencies  of  significant  computations  on  previous 
results; 

(2)  portions  of  the  program  that  can  be  done  in  parallel; 

(3)  maximiim  number  of  CPUs  that  can  be  usefully  employed 
at  each  stage  of  a  program  run; 

(4)  an  estimate  of  execution  time  for  the  various  program 
segments , 

This  information  can  then  be  used  in  designing  an 
efficient  parallel  program  layout  by  a  judicious  placement  of 
LOCKS  J  UNLOCKS,  PARs,  REQUESTS,  and  RELEASES,  in  accord  with 
the  following  set  of  rules: 

(i)    REQUEST  the  maximum  number  of  CPUs  available  for  those 
portions  of  the  critical  computation  path  that  can  be  performed 
in  parallel. 

(ii)   Within  program  segments  that  are  of  relatively  long 
duration  and  lie  on  the  critical  path,  use  a  combination  of  PAR 
and  RELEASE  to  return  the  idle  CPUs  to  the  operating  system, 
(iii)  Within  critical  program  segments  of  short  duration  use 
LOCK  and /or  PAR  to  temporarily  idle  the  unnecessary  CPUs, 
(iv)   Subtasks  that  do  not  lie  on  the  critical  path  do  not  justify 
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the  use  of  more  than  one  CPU,  so  that  CPUs  beyond  the  first 
should  be  LOCKed  out  of  these  subtasks. 

(v)  The  task  assignment  structure  should  be  so  set  up  as 
to  allow  CPUs  that  have  finished  one  particular  subtask  to 
start  on  a  concurrent  task  that  has  not  yet  been  completed. 

A  rudimentary  automatic  ' parallelization'  is  currently 
being  planned.   This  mechanism  will  examine  DO  loops  for 
inherent  parallelism  and  arrange  for  the  DO  loop  of  largest 
possible  range  within  each  set  of  nested  DOs  to  be  done  in 
parallel.   The  compiler  will  examine  the  body  of  each  DO  loop 
to  determine  whether  at  each  stage  in  its  iterative  execution 
currently  computed  values  are  independent  of  prior  traversals 
of  the  loop.   If  this  is  found  to  be  the  case,  search  continues 
to  the  containing  DO  and  so  forth.   On  completion  of  the  loop 
analysis,  the  compiler  will  generate  an  appropriate  REQUEST 
followed  by  a  DOP.   A  RELEASE  of  all  but  one  CPU  will  be  placed 
by  the  compiler  at  the  end  of  the  loop, 

9.    A  qualitative  account  of  various  parallel  programming 
experiments . 

The  few  modifications  to  Fortran  mentioned  in  Section  5 
above  have  the  fortunate  consequence  that  a  large  variety  of 
available  Fortran  programs  can  be  reprogrammed  to  run  in 
parallel  with  a  minimum  of  effort.   We  have  programmed  a 
variety  of  problems  using  this  technique.   A  sampling  of  the 
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programs  considered,  with  brief  comments  on  each,  follows. 

(a)  Matrix  multiply.   Made  parallel  by  assigning  each 
available  processor  a  row.   Trivial,  by  conversion  of  outer 
DO  loop  of  standard  procedure  to  parallel  DO  loop. 

(b)  Internal  sort  of  array.   Performed  by  insertion  of 
successive  elements  into  pre-sorted  set  of  elements,  using 
binary  search.  (This  sort  requires  no  supplementary  storage.) 
Made  parallel  by  making  the  string  a  two  dimensional  array, 
giving  each  processor  a  row  to  sort  and  varying  the  dimensions 
so  that  the  rows  being  sorted  increase  in  length. 

(c)  Parallel  minimax  search  (Kaplus-Miranker  algorithm, 
cf.  [3])  for  locating  maximum  of  a  function.   Each  processor 
was  assigned  a  point  for  the  computation  of  the  function  in 
each  of  the  successively  smaller  intervals  established  in  the 
algorithm. 

(d)  'Parallel  shooting'  method  for  solving  two-point  boundary 
value  problem  for  ordinary  differential  equations  (algorithm  due 
to  H.  B.  Keller).   Made  parallel  by  assigning  a  processor  to  the 
computations  of  each  subinterval  into  which  the  full  interval  of 
the  problem  is  divided. 

(e)  Calculation  of  eigenvalues  of  complex  matrix  by  the  QR 
method.   This  program  was  developed  by  adaptation  from  a  complex 
eigenvalue  library  routine  (QREIGEN)  used  at  the  Courant  Institute 
Computing  Center.   It  was  made  parallel  by  assigning  row  and 
column  operations  to  individual  processors  and  by  the  parallel 
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computation  of  the  components  of  sums  and  products, 
(f)    Monte  Carlo  programs  for  atomic  energy  level  calculations. 
These  programs,  developed  recently  in  connection  with  other 
research  efforts  at  our  Computing  Center  by  Dr.  M.  Kales,  lent 
themselves  very  readily  to  parallel  treatment.   For  these 
programs  it  was  arranged  that  each  processor  be  given  a  linique 
set  of  random  numbers  and  the  computations  and  list  building 
are  done  completely  in  parallel.   Existing  FORTRAN  programs 
whose  underlying  logic  makes  them  amenable  to  parallel  treatment 
may  readily  be  converted  to  parallel  programs;  this  conversion 
is  not  substantially  more  difficult  than  any  other  sort  of 
program  modification. 

Our  experience  may  be  summarized  as  follows. 

(a)  Parallel  programs  may  be  written  in  the  modified 
FORTRAN  described  above.   The  resulting  programs  are  similar 

in  appearance  to,  and  often  not  significantly  more  complex  than, 
single  processor  programs  accomplishing  the  same  calculations. 

(b)  Programs  in  which  the  bulk  of  processor  time  is  spent  in 
a  DO  loop  which  may  be  performed  in  parallel  are  of  not 
infrequent  occurrence.   These  may  be  converted  to  r\;in  in  parallel 
with  good  efficiency  by  the  trivial  expedient  of  changing 
appropriate  serial  DO  loops  to  parallel  DO  loops.   This  may  be 
done  either  by  giving  unique  loop  variable  values  to  each 
executing  processor,  in  those  cases  where  the  computation  is 
carried  out  with  different  values  of  the  variable;  alternatively. 
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-this  may  be  done  by  simultaneously  varying  the  loop  variable 
in  those  cases  where  computations  are  carried  on  in  parallel 
over  the  same  variable  range,  as  in  the  sort,  the  Monte  Carlo 
computation,  and  the  parallel  shooting  calculation  mentioned 
above.   Finally,  we  note  that  DO  loops  may  be  made  parallel 
by  any  combination  of  these  two  methods. 

We  make  one  additional  remark  on  simulation-compilation 
technique.   Since  the  PFORTRAN  compiler  has  not  been  available 
for  our  early  parallel  programming  efforts,  a  provisional 
scheme  based  on  the  use  of  the  existing  66 00  FORTRAN  compiler 
and  on  a  simple  modification  of  the  SOAPSUDS  simulator  was 
employed.   In  this  provisional  technique,  private  storage  is 
provided  for  the  compiled  programs  in  terms  of  the  following 
convention:   variables  and  arrays  occurring  in  FORTRAN  programs, 
and  not  assigned  to  FORTRAN  'numbered  COMMON',  are  taken  to  be 
PRIVATE  to  each  processor  executing  the  FORTRAN  statement. 
On  the  other  hand,  variables  and  arrays  assigned  to  FORTRAN 
'numbered  COMMON'  are  taken  to  be  PUBLIC  to  all  processors, 
so  that  values  of  such  variables  set  by  any  processor  will 
subsequently  be  used  by  any  other  processor. 

Implicit  'temporary  storage'  locations  created  by  the 
FORTRAN  compiler  are  private  to  each  processor,  in  the  sense 
explained  above. 

Intermediate  storage  is  simulated  by  locking  out  more  than 
one  CPU  from  the  portions  of  the  program  in  which  intermediate 
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variables  are  computed;  the  locks  are  written  directly  in  terms 
of  the  basic  NEWVAL  function. 

10.    Quantitative  data  on  computational  efficiency  for  a 
sampling  of  parallel  programs. 

Following  is  a  set  of  charts  which  summarize  our  actual 
running  experience.   Along  with  the  actual  observed  running 
time  vs.  number  of  CPUs^  we  have  plotted  an  efficiency  measure 
which  we  have  taken  to  be 

^1 


Ej.  =   efficiency  for  N  CPUs 


T-,  =   time  required  for  a  serial  machine  to  perform 

the  program 
N   =  number  of  CPUs 
T„  =   time  required  for  N  CPUs  to  perform  the  program. 

(The  efficiency  as  here  defined  may  be  greater  than  one  in 

some  applications  e.g.  parallel  searches^  theorem  provers^  etc.) 
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Complex  Eigenvalue  Problem  where  the  program  and  data 

configuration  was  such  that  more  than  k    CPUs  were 

useless . 

The  interference  effects  of  the  unnecessary  CPUs 

is  quite  apparent.   They  are  caused  by  delays 

built  into  the  program  that  were  used 

to  organize  the  CPUs. 
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Shockwave  Problem  cut  off  at  end  of  (essentially) 

parallel  computation. 

The  initial  drop  in  efficiency  was 

caused  by  the  parallel  coordination 

coding. 
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Monte  Carlo  Computation  (with  queued  random  number 
generator) . 

The  frequent  executions  of  the  landom  number  generator 
required  by  the  algorithm  caused  the  increase  of  running 
time  when  more  than  eleven  CPUs  were  used.   (This  could 
be  remedied  by  assigning  each  CPU  a  different  base 
number  set  up  in  such  a  way  so  as  to  insure  independence 
of  the  random  variables  thereby  eliminating  the  necessity 
for  the  queue.)   Note,  that  for  less  than  six  CPUs  the 
queue  effect  is  not  noticeable. 
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Parallel  Sort  (showing  the  effect  of  additional 
CPUs  in  extreme  conditions). 

The  remark  on  Chart  I  is  valid 
here  also.  Note  further  that  the 
interference  effect     caused  by 
the  organization  coding  cancelled 
the  usefulness  of  additional  CPUs 
even  when  they  were  able  to 
contribute  to  the  procedure  in 
curve  (a) . 
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Operational  Statistics  oi  a  Large  Number 

of  Runs  of  the  Monte  Carlo  Problem 

Table  I 


Number 
of 

CPUs 

Useful 

Time 

Parallel  Proc . 
Overhead 

Time  Available  for 
Concurrent  Jobs 

Parallel 

Series 

Range 

Total 

Range 

Total 

5 

122.4XL0^ 

5.7x10^ 

.4-1.3x10^ 

1.7x10^ 

4.4x10^  ^ 
-5.3>ao^ 

9.7x10^ 

5 

135.3 

5.7 

.4-0.9 

1.3 

4.8-5.3 

10.1 

3 

13.2 

5.7 

.2-. 5 

.7 

5.2-5.7 

10.7 

3 

103.9 

5.7 

.7-. 9 

1.6 

4.8-5.0 

9.8 

3 

117.1 

5.7 

.3-. 5 

.8 

5.2-5.4 

10.6 

3 

144.9 

59.2 

1.7 

3.4 

57.5 

115.0 

8 

79.2 

5.7 

.1-2.4 

6.4 

3.4-5.7 

133.5 

8 

64.8 

5.7 

.O-.l 

.5 

5.6-5.7 

39.4 

16 

78.3 

5.7 

.1-3.1 

21.6 

2.6-5.6 

69.6 

16 

71.2 

5.7 

.1-3.7 

19.5 

2.0-5.6 

71.7 

16 

79.6 

5.7 

.2-4.3 

19.5 

1.4-5.5 

71.7 

16 

92.5 

5.7 

.0-3.5 

16.3 

2.7-5.7 

74.9 

11 

78.3 

61.5 

.1-19.5 

35.6 

32.1-51.4 

479.4 

5 

85.3 

5.7 

.0-1.3 

2.6 

4.4-5.7 

20.2 

5 

69.5 

53.9 

.2-3.3 

5.5 

50.6-53.7 

210.1 

Useful  time:   The  time  spent  in  actual  execution  of  the 

algorithm  (the  sub-headings  represent  the  time  spent 
in  parallel  and  series  computation,  respectively). 

Overhead:   The  time  uselessly  spent  in  computation  by  processors 
about  to  be  released,  before  they  were  able  to  execute  a 
'return  to  system'  instruction. 

Time  Available  for  Sharing:   The  time  in  which  the  processors 
executing  the  algorithm  could  have  usefully  been  doing 
other  procedures  in  a  time-sharing  environment. 
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This  graph  represents  the  ratio 
of  overhead /CPU  to  useful  time  vs. 
number  of  CPUs  for  the  random  (Monte 
Carlo)  runs  summarized  in  Table  I. 
(Note  the  almost  linear  relationship.) 
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APPENDIX  I 

TRANSLATION  TECHNIQUES  EMPLOYED  IN  THE  PFORTRAN  COMPILER. 

1.    The  general  strategy  employed  in  programming  the 
PFORTRAN  compiler. 

The  source  language  to  be  analyzed  by  a  compiler  written 
in  the  general  style  which  we  have  employed  may  initially  be 
described  syntactically  in  a  notation  very  much  like  the 
standard  Backus  normal  form.   Such  a  description  implies  a 
recursive  recognition  procedure.   This  recursive  procedure  is 
programmed  in  detail  by  the  manual  transcription  of  the 
syntactic  description  into  a  set  of  calls  on  a  recursive  family 
of  procedures^  described  in  more  detail  in  the  two  following 
sections.   When  a  recognition  procedure  is  recursively  called^ 
it  may  in  turn  invoke  other  recognition  procedures^  as  analysis 
of  the  source  string  syntax  progresses.   Each  of  the  recognition 
procedures  will  eventually  return  either  successfully  or 
unsuccessfully.   By  standard  convention^  a  recognition 
procedure  is  provided  with  two  transfer  labels  as  arguments. 
Control  will  pass  to  one  argument  on  successful  return,  to  the 
other  argioment  on  unsuccessful  return.   Alternatively,  if  the 
successful  return  argument  is  zero,  control  will  pass,  in  case 
of  success,  to  the  immediately  following  recognition  procedure 
call;  if  the  unsuccessful  return  argument  is  zero,  control 
will  be  returned,  on  an  unsuccessful  return  from  a  lower 
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subroutine  levels  to  the  next  higher  level  recognition 
subroutine ;,  the  indication  of  failure  being  carried  along 
recursively. 

The  data  structure  which  carries  all  the  necessary 
return  addresses  for  this  recursive  action  is  a  simple  pushdown 
stack,  called  REGRLIS  in  the  PFORTRAN  compiler.   Two  basic 
recursive  recognition  procedure  controllers  are  provided: 
RECR  and  PROD.   The  second  resembles  the  first,  but  includes 
a  mechanism  for  iteration  of  a  procedure,  success  or  failure 
of  the  iteration  depending  on  the  number  of  successful 
returns  made  from  a  lower  level. 

When  the  recursive  recognition  process  reaches  the  'atomic' 
level,   as  eventually  it  must,  one  or  both  of  two  additional 
types  of  procedures,  examination  procesures  and /or  generation 
procedures,  will  be  employed.   An  examination  procedure  examines 
an  atom  either  for  identity  with  a  keyword  of  the  language  or 
for  membership  in  some  lexical  class  (such  as  language  keyword, 
integer  constant,  Hollerith  constant,  integer  variable,  real 
variable)  etc.  constituting  an  atomic  type  within  the  language. 
Normally,  an  examination  procedure  merely  returns  an  indication 
of  success  or  of  failure.   However,  one  particular  examination 
procedure,  RFIND,  which  bears  the  responsibility  for  recognizing 
variable  subroutine  and  function  names,  as  well  as  labels  and 
integer  constants,  i.e.  for  recognizing  all  semantic  entities 
which  are  not  keywords  of  the  language,  carries  out  some 
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significant  additional  actions.   This  routine  examines  the 
master  hash  table  (called  VARLIST)  for  prior  occurrence  of 
the  variable  name.   If  the  name  does  not  occur,  it  is 
entered  into  the  hash  table.   Moreover,  and  perhaps  more 
significantly,  RFIND  places  any  element  which  it  recognizes 
on  the  top  of  an  auxiliary  pushdown  stack  (PDL)  which 
constitutes  the  main  compiler  data  structure  for  the 
transmission  of  arguments  between  recursive  routines.   Once 
placed  on  this  stack,  such  an  element  is  available  for  subsequent 
use  during  the  translation  process. 

Generator  procedures  modify  one  or  another  of  the  global 
data  tables  used  in  the  compilation  process.   These  tables 
include  the  master  symbol  hash  table  (vARLIST),  the  recursive 
subroutine  argument  pushdown  stack  (PDL),  and  the  pending 
DO  loop  end  label  table  (IDOLIST),  as  well  as  other  related 
tables  containing  auxiliary  information,  and  described  in  more 
detail  somewhat  below. 

Among  the  principal  ge-nerator  routines  are  ARITH,  which 
generates  code  for  logical  and  arithmetic  statement  evaluation; 
RCALL,  which  generates  code  for  function  calls;  DODO,  which 
stacks  DO  labels  on  the  pending  end-label  list  (IDOLIST); 
and  RLAB,  which  generates  the  necessary  code  for  DO  loop  endings 
and  which  checks  the  well-formedness  of  DO  nesting. 

ARITH  acts  as  follows.   It  is  supplied  with  the  number 
(1  or  2)  of  its  operands,  and  with  an  indicator  of  the  particular 
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arithmetic  or  logical  operation  for  which  code  is  to  be 
generated.   It  then  finds  its  operands  (including  information 
on  operand  type,  etc.)  on  the  top  of  the  PDL  stack,  generates 
code  for  the  required  operation  into  the  output  code  string, 
and  puts  a  designator  for  the  location  of  its  result  (core 
location  or  register)  back  on  the  top  of  the  PDL  stack. 

Note  that  in  this  arithmetic  expression  compilation 
technique,  only  operands  and  not  operators  are  stacked  on  PDL. 
Information  concerning  operations  is  of  course  carried 
implicitly  elsewhere,  namely,  in  the  master  control  stack 
(RECRLIS)  of  pending  recognition  subroutine  returns. 

When  the  beginning  of  a  nested  function  call  of  the  form 
.. .F( ...,...,...,... )  is  encountered,  the  function  symbol  F  is 
placed  on  the  pushdown  stack  PDL  (by  RFIND) .   When  the  terminat- 
ing right  parenthesis  is  encountered,  RGALL  unloads  the  PDL 
pushdown  stack,  down  to  the  first  entry  flagged  as  a  fimction 
name.   The  elements  encountered  above  this  element  on  PDL 
generate  the  various  entries  in  the  target  subroutine  calling 
sequence,  which  is  of  standard  form;  the  function  name,  when 
found  on  the  pushdown  list  PDL,  defines  the  function  type 
subroutine  to  which  transfer  is  to  be  made. 

The  DO  loop  head  generator  routine  DODO  generates  all 
necessary  target  code  belonging  to  the  beginning  of  a  DO  loop, 
and,  in  addition,  stacks  the  DO  label  on  the  top  of  the  DO  stack 
(IDOLIST) .   The  generator  routine  RLAB,  which  is  invoked  when 
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the  translation  of  a  labeled  statement  (with  label  L)  has 
been  completed,  examines  the  label  for  identity  with  the  label 
L  stacked  at  the  top  of  the  DO  stack.   If  such  identity  is 
detected,  the  DO  stack  is  pushed  down,  and  target  code  for  a 
DO  loop  end  generated.   This  process  repeats  as  often  as 
necessary.   If  any  labels  identical  to  L  are  found  on  the  DO 
stack  but  not  at  its  top,  an  'erroneous  DO  nesting'  diagnostic 
is  generated. 

The  various  auxiliary  data  structures  employed  in  a 
significant  way  by  the  compiler  are  as  follows.   A  dimension 
table  DIMTAB  contains  dimension  information  for  arrays^  each 
is  referenced  by  a  basic  array  name  in  VARLIST.   A  subroutine 
argument  table  JARGLST  contains  a  list  of  all  arguments  to 
the  current  subroutine,  and  is  used  by  various  generator  routines 
to  determine  whether  'internal'  or  'external'  type  target  code 
is  to  be  generated  when  a  given  variable  is  to  be  fetched. 
This  table  is  cleared  whenever  an  END  card  is  encountered. 
A  statement  function  argument  table  ITARLST  plays  a  similar 
role  for  arithmetic  statement  functions.   Common  block  names 
and  other  common  block  information  is  accumulated  in  a  common 
block  name  table  GOMTAB.   Names  of  called  subroutines  and 
functions  are  accumulated  in  the  subroutine  name  table  SUBLIST. 
A  corresponding  local  accumulation  within  the  current  subroutine 
is  made  in  the  arithmetic  statement  function  table  ARTLST. 
Temporary  locations  for  the  storage  of  variables  overflowing 
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registers  are  accumulated  in  a  temporaries  table  ITEMP. 
Constants  and  their  descriptions  are  accumulated  in  two 
tables  ICONS  and  ICONSM. 

Address  assignments  within  a  subroutine  are  made  only 
on  encountering  the  subroutine  END  card;  this  allows  dimension 
and  other  declaration  statements  to  be  placed  in  a  subroutine 
without  undue  restriction. 

Register  allocation  is  accomplished  during  the  compilation 
process,  a  single  algorithm  governing  the  use  of  the  registers 
being  employed. 

2.     Additional  details  concerning  the  syntactic 
description  employed. 

The  general  syntactical  structure  of  the  PFORTRAN 
language  is  essentially  the  same  as  that  of  ordinary  FORTRAN. 
The  following  shows  that  portion  of  the  syntax  which  describes 
the  IF  statement;  we  use  this  particular  example  to  demonstrate 
our  translation  techniques.   The  syntactic  description  is 
written  in  a  manner  corresponding  exactly  to  its  treatment 
within  the  PFORTRAN  compiler. 

<IFSTAT>  =  :     IF  <IF> 

<IF>  =  :         (<IFTAIL>  I  <MISCD> 

1/ 
<IFTAIL>  =  :     <L^GEXP>)  <ST(^RBR>  |  <ARITHEXP>)   <THRLAB> 

1  <IFTAILB> 

1/The  distinction  between  ARITHEXP  and  L{2^GEXP  is  designated 
by  switches  within  the  primitive  operations. 
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<IFTAILB>  = 
<L^GEXP>  = 
<L^GEXP1>  = 
<Cg(NJ>   =    : 

<cg(NJ^>  =  ■ 

<L^GTRM>   = 
<L^GTR1>    = 
<L^GELM>    - 
<L^GELMA>    = 
<RELAT>   =    : 
<ST{2fRBR>   = 
<TW52ILAB>   - 
<THRLAB>   = 
<ST$2fRBRA>   =    : 
<EXPR>    =    : 
<TREXP>   =    : 
<SGNTRM>    =    : 
<TERM>   =    : 
<0FAGTP'R>    =    : 


^MISG> 


1/ 


<C0]J>    Pq         <L^GEXP1> 

<L(2rGTRM>    Pq    <G{2I1^J1> 
.AND.     <L(2fGTRM> 

.Ngfl.     <LS2fGELM>    |     <L^GTR1> 

2/ 
<L(/GELM>- 

(<Li2^GEXP>)     I     <L?CGELMA> 

<RELAT> 

<EXPR>    I     <EXPR>    <<RELi2fP»~      <EXPR> 

<TWpLAB>    I     <STi2fRBRA> 

<<STATN^»,    «STATNi2^» 

<TW$2LAB>,    <<STATN$2C>> 

<STATEMENT> 

+  <TREXP>    1     -    <TREXP>    |     <TREXP> 


/ 


=TERM> 


=TERM> 


<FAGT$2fR>    Pq    <OFAGT0I^> 
*    <FACTP'R>    I     /  <FAGT^R> 


1/  JT 

— '  Pt,  has  the  meaning:  the  following  item  occurs  at  least 

M  times  and  as  many  as  N  times. 

Constructions  like  these  are  used  occasionally  to  make 
the  subsequent  code  generation  somewhat  easier. 

Double  brackets  will  enclose  those  primitives 
that  are  not  standard  operators. 


2/ 


3/ 
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<FACT^>  =  :     <BEXP>  **    <BEXP>  ]  <BEXP> 

<BEXP>  =  :       (<L^GEXP>)  I  <ARRY>  ]  <<SIMPLE  VARIABLE>> 

(<PAREXP>)  I  <<C{2ll^STANT>> 
<ARRY>  =  :       «ARRAYNAM>>  (<SUBSCR>)  |  <<ARRAYNAM» 
<PAREXP>  =  :     <EXPR>  Pq  ,  <EXPR> 

<SUBSCR>  =  :     <EXPR>  P^  ,  <EXPR> 

1/        ,         2/ 
<MISGD>  =  :      <<ADDUM>>-/<CMSTL>  |  <<QU^>>   <CMSTL> 

<CMSTL>  =  :      <TW^B> 

<MISC>  =  :       <<E$2(F»  <<INTEGER»)  <TW!2(LAB> 

The  more  detailed  form  of  the  control  macros  used  in 
encoding  syntax  of  the  above  type  into  executable  form  will  be 
explained  in  the  following  section.   Before  doing  so,  however, 
let  us  note  the  uses  intended  for  our  principal  control  macros, 

(a)  Simply  recursive  elements  of  the  syntax   (the 

items  enclosed  by  "<  >"  in  the  preceding  syntax)  are  handled 
by  the  REGR  macro. 

(b)  Multiply  recursive  elements  (represented  in  the  above 
syntax  listing  by  items  K,   followed  by  a  bracketed  item)  are 
handled  by  the  PROD  macro. 

(c)  Primitive  elements  (represented  in  the  above   syntax 
by  elements  enclosed  in  double  brackets)  are  detected  by 
the  EXAM  macro. 

(d)  Target  code  generation  is  handled  by  the  GENR  macro. 


—     Accumulator  overflow. 
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3.   Additional  details  concerning  the  control  macros  used  to 
program  the  syntactic  analysis. 

The  following  set  of  macros  has  been  found  useful, 
especially  for  expressing  that  part  of  the  compiling  system 
which  scans  the  input  string  (PFORTRAN  program)  and  governs 
its  translation  to  a  completely  explicit  internal  machine  form. 
These  macros  enable  this  major  portion  of  a  compiler  to  be 
programmed  by  what  is  essentially  a  transliteration  of  the 
input  language  syntax  as  given  in  a  slightly  modified  Backus 
notation.   Use  of  this  technique  makes  it  relatively  easy  to 
change  the  input  language . 

The  form  of  the  translation  control  macros  which  we 
have  employed  is  as  follows, 
(i)    The  REGR  macro. 

The  REGR  macro  has  the  form 

REGR     DEF,  FAIL,  BAGK 

This  macro  causes  a  recursive  execution  of  the  macro  string 
starting  at  location  DEF;  i.e.  control  is  transferred  to 
location  DEF  and  the  current  location  (i.e.  the  return  location] 
is  placed  at  the  top  of  a  return  push-down  list.   FAIL  and  BAGK 
are  two  code  address  pointers,  which  control  the  action  to  be 
taken  on  return  from  a  lower  level  of  recursion  in  the  manner 
explained  in  what  follows. 

The  macro  string  at  location  DEF  returns  control  to  the 
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current  location^  either  on  "successful"  termination  of  the 
macro  coded  string  beginning  at  DEF,  or  by  the  "FAIL" 
equivalent  if  the  macro  programmed  code  beginning  at  the  label 
DEF  cannot  be  executed  with  full  success.   Note  that  on 
execution  of  the  PROD,  EXAM,  and  TEST  macros  described  below, 
a  success-failure  bit  is  set  which  controls  the  decision  to 
transfer  upon  return  from  a  lower  recursive  level  to  the  BACK 
or  to  the  FAIL  address. 

If  BACK  =  0  in  the  REGR  macro  call,  execution  continues 
in  line  on  successful  return  from  a  lower  level  of  recursion. 
If  BACK  =  1  in  the  macro  call,  successful  return  from  a  lower 
level  will  at  once  force  return  to  the  next  higher  level  with 
the  success  bit  still  set. 

When  FAIL=  0  an  unsuccessful  return  from  a  lower  level 
of  recursion  forces  return  to  the  next  higher  level  with  the 
success  bit  reset. 

If  FAIL  is  not  zero,  and  control  is  returned  from  a  lower 
level  of  recursion  with  the  success  bit  reset,  control  is 
transferred  directly  to  the  macro  string  beginning  at  location 
FAIL. 
(ii)   The  PROD  macro. 

The  PROD  macro  has  the  form 

PROD      DEF,  FAIL,  BACK,  M,  N 
PROD  causes  the  recursive  execution  (in  exactly  the  same 
way  as  REGR)  of  the  macro  string  beginning  at  DEF  up  to  N  times. 
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or  until  a  non-successful  return  is  encoiontered.   The  exit 
procedure  is  as  follows: 

(a)  If  the  number  of  successful  returns  from  DEF  is  at  least 
M,  hut  not  more  than  N^  and  BACK  =  0,  continue  directly  in  the 
current  macro  string. 

(b)  In  the  same  situation  if  BACK  =  1,  set  the  success  bit 
equal  to  1  and  transfer  control  to  the  next  higher  level  of 
recursion. 

(c)  If  the  number  of  successful  returns  is  less  than  M  or  more 
than  N  and  FAIL  is  not  zero^  transfer  control  directly  to  location 
FAIL. 

(d)  In  the  same  situation,  if  FAIL  equals  zero,  reset  the  success 
bit  to  0  and  transfer  control  the  next  higher  level  of  recursion. 

(iii)  The  EXAM  Macro. 

The  EXAM  macro  has  the  form 

EXAM      R^T,  FAL,  BACK,  PAR 

EXAM  causes  the  execution  (as  an  ordinary,  and  often 
FORTRAN  written  procedure)  of  the  subroutine  ROUT,  supplying  it 
with  the  parameter  PAR;  the  routine  returns  in  the  normal  way 
and  either  sets  or  resets  the  success  bit.   Thus,  for  example, 
the  routine  may  be  one  that  compares  the  current  item  of  the 
input  string  with  the  parameter  supplied  it  and  sets  or  resets 
the  success  bit  on  finding  equality  or  inequality. 

As  noted  above,  certain  particular  subroutines  called  by 
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EXAM,  in  particular  PiFIND,  may  access  the  master  hash  table 
(VARLIST)  and  may  place  a  recognized  element  on  top  of  the 
argument  pushdown  stack  (PDL). 

The  actions  on  return  from  a  subroutine  called  by  EXAM 
are  the  same  as  for  the  REGR  and  PROD  macros. 

(iv)   The  GENR  macro. 

The  GENR  macro  has  the  form 

GENR      R^T,  TRNS,  BACK,  PAR 

GENR  causes  the  execution  of  the  generator  subroutine 
ROUT  supplying  it  with  the  parameter  PAR.   A  generator  routine 
called  in  this  way  will  typically  generate  a  porition  of  the 
output  string  (the  object  language  or  some  related  intermediate 
language)  and /or  update  internal  tables  (e.g.  YAKLIST   or  POL) 
using  the  parameter  and  the  internal  tables  as  data. 

On  return  from  a  called  subroutine  ROUT,  GENR  will: 

(a)  return  control  to  the  next  higher  level  of  recursion 
if  the  BACK  bit  is  setj 

(b)  transfer  control  directly  to  the  label  TRNS  if  BACK  =  0 
and  TRNS  is  not  equal  to  zero; 

(c)  continue  in  the  current  string  if  TRNS  equals  zero. 
Two  additional  utility  macros  SET  and  TEST  are  also 

provided.   The  SET  macro  has  the  form 

SET     SW,  BIT 
SET  sets  a  one  bit  switch,  switch  number  SW,  to  the  value 
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of  BIT.   It  aas  foimd  in  practice  that  providing  a  small  number 
of  these  switches  (less  than  lOO)  is  quite  useful.   The  TEST 
macro  has  the  form 

TEST     SW,  BIT,  BACK,  FAL 

This  tests  the  numbered  switch  SW  for  equality  with  BIT 
and  sets  or  resets  the  success  bit  accordingly. 

The  actions  then  taken  are  the  same  as  described  for 
RECR  above. 

4.    An  example  of  the  compiler  macro  coding 

The  following  extracted  form  the  compiler  and 
accomplishes  the  translation  of  IF  statements*  labels  used 
in  the  extract  below  generally  correspond  to  those  used  in 
the  BNF  syntax  description  for  the  IF  statement  given  in 
Section  2  of  this  Appendix.   The  reader  is  advised  to  follow 
the  macro-code  below  using  the  material  in  Section  2  as  a  guide, 
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IF         EXAM  RC0MPAR,LP»MISCD.0 

PECR  IFtAIL,FPRIF,0 

GENR  RIF,C,STRT,0 

I'^TAIL    RECR  LOGEXD,  IFTAILP.O 

EXAM  RCOMPAR»pn»ERRRP»0 

TEST  2»1»THRLAB,0 

RECR  STORBR»ERP.l 

IFTAILB   RECR  MISCERR.l 


LOGICAL 


LOGICAL 


LOGICAL 


LOGEXP 
LOGEXPl 


COM  J 
CONJl 


LOGTRM 

EXAM 

SET 

PECR 

GEMR 

LOGTRl 

RECR 

LOGELM 

EX^M 

RECR  CONJ»ERP»C 

PROD      LOGEXPl  ♦cTRRQ!?,  1  ,0,  ^  777f 

EXAM  RCOMPAR,OR,0,0 

S^T     2»1 

RECR    CONJ»ERP.O 

GFNR  ARITH,L^''»0,1 

pcrrp    LOGTR^/'»E'=P  »0 

PROD  COMJl ,0,1 ,^,7777° 

EXAM  RCOMPAR ,AND,0,C 

SET     2,1 

PECR      LOGTPM,ERR,0 

GFMP      API TH,LAND»0,  1 

RCOMPAR, NOT  ,  LOGTRl  ,0 
2,1 

LOGEL^',FROOP,0 

ARITH,LNOT,0,1 

LOGELM, ERROR, 1 
RCOMPAR, LP, LOG ELM A ,0 
PECR    LOr-FXP ,  E'^R»0 
rXAv    rpcnu^DAP  ,  i?P,ERR  ,  1 


RELATIONS  -^^--■■* 

LOGELMA 

RECR 

RELAT, ERR, 1 

RELAT 

RECR 

EXPR  ,  FR'^  ,0 

RECR 

PELOP ,5UC,C 

SET 

2,1 

RECP 

FXPR ,ERP,0 

GENR 

RARITH, 0,0,1     .RARITH  S' 

PELOP 

EXAM 

RCMPAR1,RFLIST,0,] 

STOPBR 

DFrp 

TW'^LAB,  STORBRA,  1 

T'A'OLA° 

EXAM 

'5FIND,STATNO,0,0  ," 

EXAM 

DCOMP  AP  ,  Ci^'-'MA  ,  ERR  »  0 

EXAM 

PFIND,STATN0,ERR,1 

THRLA5 

R^CR 

TWOLAB,ERR,0 

EXAM 

RCOMP  AR , COMMA , ERR , 0 

EXAM 

PFIND,5TATN0,FPR ,1 

STORBRA 

GENR 

R I  P  ,  0  ,  0  ,  3 

RE^R 

D  I  ST  1,0,1 

'^ISC 

=^XAM 

RCO-^PAP,EMDF,ERRIF,0 

RECR 

MSCTL,ERR,1 

MISCD 

EXAM 

RCOMPAR,ACCUM,MI 5CE,0 

pprp 

CMSTL,EPR,1 

MI5C^ 

f^XAM 

prpMpi^p  ,QLio,  cppLp  ,  0 

PECR 

CMSTL,EPP,1 

r->    RELATIONS    ******    RELATIONS 
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MSCTL     EXAM  RF I ND » I NTEGER tERR , 0 

EXAM  RCOMPAR,RP»ERR»0 
GENR        RENF.OfO.O 

RECR  rWOLAB,ERR.l 

CMSTL     GENR       ROVFL»0,0»0 

RECR  TWOLAB,ERR.l 

SUBSCR    RECR  EXPR,ERR,0 

PROD  CEXPR.0,1,0,2 

CEXPR     EXAM  RCOMPAR,COMMA.0,0 
RECR       EXPR,ERR,1 


EXPR  **«■■«■■!!••«•****  EXPR  ***■»•«■**■«-*•»•  EXPR  *■' 


EXPR      EXAM  RC0MPAR,PLUS»EXPR2 ♦O 

EXPRl     RECR       TRFXP,ERR»1 

EXPR2     EXAM  RCOMP AR ,M INUS ♦EXPR 1 , n 
R^CR  TRFyP,FRR,0 
GENR        ARITH,UMIM,0»1 

TREXP     RECR  TERM, 0,0 

PROD  SGnTRM,FRR,1 ,0»8 

SGNTRM    EXAM  RCQMP AR , PLUS ,SGNA , 0 
RECR  TFPM,ERP,0 
GENR      ARITH,PLS,C,1 

SGNA      EXAM  RCOMPAR  ,^MNUS  ♦  0,0 
RECR  TEPM,ERR,0 
GENR      ARITH,MIN,C(,1 

TERM      RFCR  FACTOR, ERR. 0 

PROD  CFACTOR,0,1 ,0»8 

OFACTOR   EXAM  RCOMP AR , T IMES »OFCTA , 0 
RECR  FACTOR, 0,0 
GENR      APITH,TIM,0,1 

OFCTA     EXAM  RCOMPAR , D I V  I DED ,0 , 0 
RECR  FACTOR»0,0 
GENR  ARITH,DTV,0,1 

FACTOR    RECR  BExP,FPP,0 

EXAM  RCOMDAR,EXP^M'^M  ,F.'\rA  ,0 
RECR  3EXP»ERR,0 
GENR  ARITH,P0V.'^P,0,1 


FACA 

GENR' 

sue, 0,0,1 

RECR 

rxPR  (C^PR  ,  O 

EXAM 

PCrf.'DAp  ,rD,rDp  ,  1 

3XA 

RECR 

ARPY,PXn,l 

BXB 

EXAM 

RFIND,SIMPL-  ,PXC 

EXAM 

RCOMPAR, L^,SUCr' 

RECR 

DAREXP,0 ,0,0 

EXAM 

PC0MPAP,PD,0» ] 

PARFXf^  -It *■!<.■»■  ■«■-!<•**■»■   PARPXP  *^^*->*-^**    PAREXP  * 


PAREXP    RECR  EXPR, ERR, 0 

PROD  CEXPR, 0, 1  ,0,? 
BXC       EXAM        RFrND,CN^T,FPP, 1 
ARRY      EXAV!        RFIND,ARRA|n">/,C,^ 

EXAM        RCOMPAR, LP, BXD,':' 

RECR         SURSCR,ERR,0 

GENR  RSUBSC, 0,0,0 

EXAM         RCO^^PAP,RP,FPr',l 
BXD       GENR         PVAR,o,SUC,0 
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FUNCTION  ****»»*»  FUNCTION  *******  FUNCTION 


FNST      EXAM       RF I ND, S I MPLE . ERR , 0 
EXAM         RCOMPAR»LP» ASl .0 
RECR  PAREXP,ERR,0 
EXAM      RCOMPAR.RP»ERR »0 
GENR      FUNCS,0»0»1 

FUNCTX    GENR  FUNC.O,C,0 

RECR  PAREXP.ERR,0 
EXAM  RC0MPAR,RP,E'?R»1 
i 

BEXP  EXAM  RCOMPAR,LP,BXA,0 
RECR  LOGFXP,0,0»0 
EXAM       RC0MPAR,RP,0»1 
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5.     Future  Instructions. 

It  is  expected  that  a  later  version  of  our  parallel 
compiler  will  include  the  following  instructions: 
(i)    DATA  L^'GK  A 

This  forces  a  wait  on  all  accesses  to  the  variable  or  array  A. 
until  execution  of  a  DATA  UNLOCK  statement, 
(ii)   DATA  RELEASE  A 

This  returns  any  CPU  that  accesses  the  variable  or  array  A 
to  the  executive  system, 
(iii)  DATA  JUMP  A   Tg(  S 

This  causes  any  CPU  that  accesses  the  variable  or  array  A 
to  jump  to  location  S. 
(iv)   DATA  UNL^'CK  A 

This  causes  the  resetting  of  any  of  the  above  interlocks 
on  the  variable  A. 

These  instructions  enable  the  handling  of  intermediate 
storage  in  a  particularly  simple  manner;  they  also  provide 
means  for  CPU  intercommunication  which  is  quite  efficient 
in  many  cases. 


.56- 


APPENDIX  II 
USERS  SUMMARY  OF  THE  FORTRAN  HARDWARE  COMPILER- SIMULATOR 

1.    Describing  a  Subassembly 

(a)  A  subassembly  is  described  by  a  Fortran  subroutine^ 
having  N+l  +  E   arguments.   Here: 

(ai)  N  is  the  number  of  external  lines,  or  external  bundles 

of  linesj  leading  into  or  out  of  the  subassembly,  from  or 

to  the  simulated  hardware. 

(ail)  One  argument  is  reserved  for  the  symbolic  name  of  the 

subassembly,  which  is  a  left- justified  Hollerith  constant 

of  up  to  7  characters.   This  Hollerith  constant  will  appear 

on  the  output  listing  as  part  of  the  'nested  context'  name 

associated  with  each  gate  and  line  of  the  total  component 

listing. 

(aiii)  The  third  group  of  E  arguments,  which  are  optional, 

may  be  used  to  convey  any  additional  integers  or  other 

information  of  significance  for  the  given  submodule . 

example:   SUBR^TINE  module  (IN1,IN2,IS2(IJT,NAM,NUMINS1,NUMINS2) 

for  a  module  with  two  bundles  of  inputs,  and  one  output, 

where  the  numbers  of  inputs  may  be  specified. 

(b)  The  first  executable  statement  in  the  subroutine  must  be 

CALL  NAME (nam) 
where  NAM  is  the  subroutine  parameter  in  which  the  module 
symbolic  name  will  be  delivered.   This  call  identifies  the 
module  symbolically  to  the  hardware  compiler. 

(bi)  The  next  group  of  executable  statements  in  the  subroutine 
must  have  the  form 
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CALL  NAMEL(LINE,  left- justified  Hollerith  line  name) 
One  such  statement  must  appear  for  each  variable  representing 
a  line  of  the  submodule  which  is  not  an  argument  of  the  subroutine 
These  statements  identify  the  FORTRAN  variable  LINE  as  represent- 
ing a  wire  of  the  submodule^  and  assign  it  a  symbolic  name. 

EXAMPLES:         CALL  NAMEL(LIN(2lUT,6L0tJTPUT) 
CALL  NAMEL(L1,6LIJANDK) 

(c)  If  groups  (i.e.  arrays ^  i.e.  bundles)  of  lines  are 

involved,  the  utility  function  NAMEGEN  provided  as  part  of 

the  simulator  system  may  be  used  to  generate  symbolic  names 

for  lines.   This  has  the  form  NAMEGEN  (Hollerith  constant). 

The  hollerith  constant  must  be  left  justified,  3  characters. 

NAMEGEN  will  then  prefix  successive  BCD  integers  to  generate 

unique  names. 

EXAMPLE:  D!2f  1  J=l,7 

1  CALL  NAMEL (LINES (J), NAMEGEN (3LGAT)) 

has  the  same  effect  as 

CALL  NAMEL ( LINES (1),  7LGAT0000) 

CALL  NAMEL(LINES(2),  TLGATOOOl) 

CALL  NAMEL(LINES(3),  TLGAT0002) 

CALL  NAMEL (LINES (4),  ...  etc. 

(d)  The  next  group  of  executable  statements  must  be  a 
succession  of  calls  on  subroutines  describing  the  various 
submodules  of  the  module  described  by  the  total  subroutine. 
These  submodules  may  either  be  composite  submodules  described 
by  other  user-written  subroutines  or  may  be  one  of  the 
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following  elementary  submodules  provided  as  part  of  the 

simulator  system*. 

N1,N2^N5,n4         --  nand  gates,  1  to  4  inputs 

D2,D3;,d4         --  dot-ands,  2  to  4  inputs 
FFll, FF12, FF13, FF14, FF21, FF22, FF32, FF42, 
FFJl, FF32, FF55i  FF34 , FF41, FF42, FF43, FF44 

--  flipflops  of  1  to  4  set  and 
reset  inputs. 

These  submodules  have  the  evident  number  of  input  lines; 

nand  gates  and  dots  have  1  output  line,  while  flipflops 

have  a  direct  and  a  complement  output  line. 

EXAMPLES:    ^^^^  FF12(ISET,1RESET,JRESET,I^T,I0JTC,7LCTRLBIT, 
IPHASE) 
GALL  N2(I1,12,J^T,5LGATIN) 
CALL  FULLADD(l,J,K,Li2lWBIT,IHBlT,5LADDR8) 

(e)     The  final  group  of  executable  statements  should  be 

CALL  UNNAME 
RETURN 

The  CALL  UNNAME  statement  keeps  the  system's  internal  name 

context  accounting  in  current  status. 

2.      The  Main  Program  for  Compiling  and  Exercising  a  Total 

Assembly. 

A  full  assembly  to  be  compiled  and  exercised  will  itself 
be  described  by  a  (highest  level)  subroutine  of  the  type 
described  above.   This  subroutine  should  then  be  called  from 
a  main  routine  as  described  in  the  following  paragraph.   A 
call  in  appropriate  form  will  cause  the  static  compilation 
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and  dynamic  exercise  of  the  full  assembly;  applicable  diagnostics 
will  be  printed,  etc. 

(a)     The  first  executable  statement  of  the  main  program 
should  be 

(label)      CALL  STEP(n) 

where  N  is  the  desired  number  of  simulation  steps  to  be  carried 
out.   This  call  initializes  all  system  tables  necessary  for  the 
subsequent  compilation-simulation.   If  N  =  0,  a  hardware  compila- 
tion will  be  performed  but  no  simulation  attempted. 

EXAMPLES:    CALL  STEP(O) 

CALL  STEP(150) 

(bi)    The  next  group  of  executable  statements  should  be  a 

collection  of  calls 

CALL  NAMEL(L1NE,  left- justified  hollerith  line  name) 

One  such  statement  must  occur  for  each  variable  representing 

an  external  output  line  of  the  assembly  being  compiled ,  that 

is,  a  line  which  is  an  output  from  some  gate,  flipflop,  dot, 

or  other  subassembly  of  the  full  assembly.   These  statements 

identify  the  FORTRAN  variable  LINE  as  representing  a  wire  of 

the  assembly,  and  assign  it  a  symbolic  name.   (Cf.  l.ci.  above, 

and  the  examples  given  there.) 

(bii)   Input  wires  to  the  subassembly,  that  is,  wires  providing 

signals  to  the  assembly  from  hypothetical  external  equipment, 

but  which  are  not  outputs  of  any  gate,  flipflop,  dot,  or  other 

subassembly,  should  be  specified  to  the  system  by  the  alternate 

call 
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GALL  INAMEL(LINE,  left  justified  Hollerith  line  name,  N) 

where  the  additional  parameter  N  specifies  the  number  of  times 

the  designated  line  may  be  used  as  an  Input  without  overloading. 

EXAMPLES:    CALL  INAMEL(IN1,  6LINPUTA,8) 

CALL  INAMEL(L11,  7L66CHAN1, 24 ) 

If  a  group  (I.e.  an  array  or  bundle)  of  Input  lines  Is  to 

be  described,  the  utility  function  NAMEGEN  may  be  used  to 

generate  symbolic  names  as  arguments  to  the  subroutine  INAMEL; 

thus 

EXAMPLE:     ^^  ^  ^^^^^ 

1     CALL  INAMEL ( LINES ( J) ,NAMGEN(3LGAT), 8) 

Is  valid.   (Cf.  l.cl.  above  for  an  additional  discussion  of 

NAMGEN. ) 

(c)    The  next  group  of  executable  statements  is  required 

only  if  simulation  of  the  subassembly  is  intended.   This 

group  of  statements  consists  of  a  set  of  subroutine  calls 

of  the  form 

CALL  SET(LINE,LINYAL) 

which  sets  the  value  of  the  input  line  LINE  to  one  of  the 

boolean  LINVALues  0  or  1.   Any  input  line  to  the  assembly 

not  SET  in  this  manner  will  have  an  undefined  boolean  value 

during  subsequent  simulation. 

EXAMPLE:     CALL  SET(lNLIN,l) 

A  separate  value  of  the  input  line  value  may  be  specified 

for  every  subsequent  step  of  simulation.   As  a  convenient 

source  of  input  values  a  utility  function  K0MT   is  provided. 
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This  has  the  form  KpUNT(N^M),  whose  value  on  the  K-th  call  to 

KptINT  is  the  N-th  boolean  bit  of  the  integer  K/M. 

EXAMPLE:  CALL  SET  (INI,  Kg(UNT(l,3)) 
CALL  SET (IN2,  K9(tMT(  2,  5)  ) 
CALL  SET (IN3,  KQtJNT(3j3)) 

(d)  Next  should  follow  a  call  to  the  'main'  subroutine 
defining  the  assembly  to  be  compiled  and  exercised. 
example:     call  ADDER(IN1,1N2,IN3,L$3WBIT,IHBIT,5LADDER) 

(e)  The  next  group  of  executable  statements  is  required 
only  if  simulation  of  the  subassembly,  and  printout  of  the 
results  of  the  simulation,  is  desired.   This  group  consists 
of  a  set  of  statements 

J(K)  =  LINE 
defining  the  successive  locations  of  a  dimensioned  array  with 
the  various  lines  whose  values  are  to  be  printed  during 
simulation,  and  of  a  single  additional  statement 

CALL  PNTL(N,J) 
to  the  simulation-print  routine  PNTL.   The  first  argument  N 
of  PNTL  is  the  number  of  lines  represented  in  the  dimensioned 
array  J. 

(f)  The  final  executable  statement  of  the  main  program 
should  be  the  transfer 

0(2^  10  label 
which  returns  to  the  first  executable  statement  (which, 
cf.  (a)  above,  has  the  form   GALL  STEP(N)). 
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5.      Flipflop  Phases  and  Clock  Signals. 

The  final  IPHASE  argument  to  the  built-in  flipflop 
routines  FF11,FF12, . . . ^  etc.   (Cf.  l.d.  above  for  details) 
is  a  set  of  bits  (up  to  9  in  number)  describing  the  clock 
phases  on  which  the  flipflop  described  by  a  call  to  one  of 
these  subroutines  is  allowed  to  vary.   If,  during  simulation, 
a  flipflop  is  found  to  vary  on  an  illegal  clock  phase,  a 
diagnostic  statement  will  be  printed,  but  simulation  will  be 
continued.   (The  simulator  is  presently  provided  with  a 
simulated  two-phase  internal  clock,  but  the  number  of  phases 
may  trivially  be  increased  to  a  maximum  of  9 • )   If  a  flipflop 
is  to  change  only  on  clock  phases  1, 3, 5, . . . , etc . ,  it  should 
be  called  with  its  final  argument  IPHASE=1;  if  it  is  to  change 
only  on  clock  phases  2,4,6, ... ,etc . ,  it  should  be  called  with 
its  final  argument  IPHASE=2.   A  flipflop  which  may  change 
either  on  even  or  odd  phases  should  have  both  phase  bits  set, 
and  should  consequently  be  called  with  its  final  argument 
IPHASE=5  (since  3  =  l.g(R.2). 

The  clock  phase  bits  may  be  used  as  gating  signals  via 
the  built  in  function  KL$2(!k.   KLi2te(l)  has  the  value  1  on  the  odd 
clock  phases,  KL52(K(2)  has  the  value  1  on  even  clock  phases. 

The  statement 

CALL  SET(J,KLS2(K(N)) 
will  put  the  simulated  clock  signal  (even  or  odd  phases, 
depending  on  N)  on  the  line  J;  in  this  way,  appropriate  clocking 
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of   the   various   submodules   of   the   simulated  hardware   can  be 

achieved. 

APPENDIX 

SOME   SAMPLE    PROGRAMS 

C      ****    SAMPLE    EXCLUSIVE    0R  SUBRSZHJTINE  ************** 
n  *****    *****    *****         SAMPLE  *****    *****    ***** 

SUBR^TINE   N^R(I,J,K,NAM) 
C  SAMPLE   SUBR!2ItJTINE   T0  EXEMPLIFY    GENERAL   SCHEME 

C  BUILD  UP  ADDITI(2li^AL    C!2ltJTEXT 

CALL   NAME (NAM) 
G  ASSIGN   NAMES    T0  INTERNAL  LINES 

CALL  NAMEL(KKK,3LKKK)      ^      CALL  NAMEL(LLL, 5LLLL) 

CALL  NAMEL(LL,2LLL) 
C  DESCRIBE  4    FLIPFL0'PS,    ASSIGN    GATE  NAMES 

CALL  N2(l,JjLL,5LGATIN) 

CALL  N2(I,LL,LLL,4LGAT+) 

CALL  N2(J,LL,LLL,4LGAT-) 

CALL  N2(LLL,KKK,K,5LGAT^) 
C  REMi^^Z-E   INSERTED    Ci2li^TEXT  AND    RETURN 

CALL  Ul^NAME 

RETURN 

END 

C     ****       SAMPLE   PROGRAM  SHEWING  THREE   BIT   T0  TW(2^  BIT   FULL  ADDER 
Q  *****    *****    *****         SAMPLE  *****    -it-****    -x-**** 

PR^'GRAM  HWTEXT(INPUT=100,^TPUT=100,TAPE1=^TPUT,TAPE2- INPUT) 

DIMENSIS2ll^   IPNT(5) 
1  CALL  STEP(17) 

CALL   INAMEL    (I,1LI^8)^      CALL   INAMEL( J, ILJ, 8) 

CALL    INAMEL (K, ILK, 8) 

CALL   NAMEL(IJ,2LIJ) 

CALL  NAMEL(JI,2LJI) 

IPNT(1)=I   ^    IPNT(2)=J      ^    IPNT(5)=K  |    IPNT(4)=IJ   |    IPNT(5)=JI 

CALL   SET(l,K0UNT(l,5))iGALL   SET(  J,  K(2IUNT(  2,  3  )  )^GALL  SET(K, 

XK!3tJET(3.5))  ^         ^. 

[continued ] 
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CALL  ADDR(I,J,K,IJ,JI_,7LADDUNIT) 

CALL   PVAL(5,IPNT) 

Q(^  T(2^  1 

END 

SUBR(2ltJTINE  ADDR(  I,  J,K,  II,  JJ^NAM) 

CALL  NAME (NAM) 

CALL  NAMEL(NIAJ,4LNIAJ)    ^      CALL  NAMEL(NIAK,4LNIAK) 

CALL  NAMEL(NJAK,4LNJAK)    ^      CALL  NAMEL(NIJK,4LNIJK) 

CALL  NAMEL(INJK,4LINJK)    ^      CALL  NAMEL( JNIK,4LJNIK) 

GALL  NAMEL(KNIJ,4LKNIJ) 

CALL  N2(I,J,NIAJ,1LA)    ^      CALL  N2( I,K,NIAK,1LB) 

CALL  N2(J,K,NJAK,1LC)    |      CALL  N3(NIAJ,NIAK,NJAK,IIj1LD) 

CALL  N3(I,J,K,NIJK,1LE)    |CALL  N3( I,NIAJ,NIAK,INJK, ILF) 

CALL  N3(J,NJAK,NIAJ,JNIK,1LG)    ^    CALL  N3 (K,NJAK,NIAK, KNIJ.ILH) 

CALL  n4(INJK,JNIK,KNIJ,NIJK,JJ,1L^) 

CALL  UNNAME 

RETURN 

END 
G     *****    SAMPLE    PROGRAM  SHYING   PARITY   TREE  WRITTEN    IN  TERMS    ^ 
G  EXCLUSIVE    (^  ***    *****    SAMPLE  *****    *****    ***** 

PR(2fGRAM  HWTEXT(INPUT=I00,  ^TPUT=100,TAPEl=plJTPUT,TAPE2=  INPUT) 

DIMENSI^    IPNT(5) 

CALL  STEP(17) 

CALL    INAMEL(I,1LI,8)       ^       CALL   INAMEL( J.ILJ, 8) 

CALL   INAMEL(K,ILK,8)       ^      CALL    INAMEL(L, ILL, 8) 

CALL  NAMEL(IJKL,4LLIN!2^) 

IPNT(I)=I      I       IPNT(2)=J      I       IPNT(3)    =    K 

IPNT(4)=L      ^       IPNT(5)=IJKL 

CALL  SET(l,K?ltraT(l,4))    ^      CALL  SET(  J,KS2IIJNT(2,4  ) ) 

CALL  SET(K,K0tJNT(3,'^))    $      CALL  SET(L,K(2IIJNT(4,4  ) ) 

CALL  TREE4(I,J,K,LjIJKL,5LTREE4) 

GALL   PVAL(5:,IPNT) 

Gi2f  T{2r  I 

END 

[continued ] 
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SUBR{2ltJTINE   TREE4(I,J,K,L,IJKL,NAM) 

SUBROUTINE    CALLING  BAND 

CALL  NAME (NAM) 

CALL  NAMEL(IJ,2LIJ)      $      CALL  NAMEL(KL, 2LKL) 

CALL  DAND(l,J,IJ,3Lp^+)      ^      CALL   DAND(K,L,KL,3L$2(R- ) 

CALL  DAND(IJ,KL,IJKL,5L9^^T) 

CALL  UNNAME 

RETURN 

END 

SUBR^TINE  BAND ( I, J ^IJ, NAM) 

SUBR(3lJTINE  USING  AND-D^  F^  EXCLUSIVE    (^ 

CALL  NAME (NAM) 

CALL  NAMEL(NI,2LNI)       $      CALL  NAMEL(NJ,  2I21J) 

CALL  NAMEL(IAJ,5LIAJ)      ^      CALL  NAMEL(NIJ, 3LNIJ) 

CALL  N1(I,NI,4LINVI)       $      CALL  Nl ( J,NJ,4LINVJ) 

CALL  N2(i,J,IAJ,2LG1)       $      CALL  N2(NJ,NI,NIJ, 2LG2) 

CALL  D2(IAJ,NIJ,IJ,5LD^) 

CALL  UNT^AME 

RETURN 

END 

SAMPLE   PRP'GRAM  SHYING  TW$2f  BIT   SHIFT   REGISTER  *****    ***** 
******    *****    *****         SAMPLE  *****    *****   ***** 
PR!2fGRAM  HWTEXT(INPUT=100,  ^TPUT=100,TAPEl=^TPUTj  TAPE2=INPUT) 
DIMENSI{2lN    IPNT(5) 
CALL  STEPfiy) 

CALL   INAMELfl,lLI,8)    ^      CALL   INAMEL( J.ILJ, 8) 
CALL   INAMEL(K,lLKj8) 
CALL  NAMEL(IJ,2LIJ) 
CALL  NAMEL(JI,2LJI) 

CALL  SET(I.I)    $CALL  SET(  J,  K0lJNT(l,  2)  )^    CALL  SETfK,  K^NT(  2,  2)  ) 
IPNT(I)=I   $    IPNT(2)=J   ^    IPNT(5)=K  $    IPNT(4)=IJ    $    IPNT(5)=JI 
CALL   SHIF2(I,J,K.IJ,JI,5LSHIF2) 
CALL   PVAL(5,IPNT) 
G0  T(2^  1 
END 

SUBR(2ltJTINE   SHIF2(l, J,K,LL1,MM1,NAM) 

2-BIT   CIRCULAR  SHIFT  F(2te  TESTING 

CALL  NAME (NAM) 

CALL  NAMEL(L1,2LLI}   i   CALL  NAMEL(L2, 2LL2} 

CALL  NAMEL(M1,2LM1)   ^   CALL  NAMEL(M2, 2LM2) 

CALL  NAMELfLL2,5LLL2) 

CALL  NAMEL(MM2,3LMM2) 
CALL  N2(I,LL1,L1,4LGLL1)  |  CALL  N2( I, LL2, L2,4LGLL2) 
CALL  N1(J,M1,4LGMM1)   ^   CALL  N2(K^MM2,M2,4LGMM2) 
CALL  FF11(LI,L2,MM1,MM2,4LEVEN,1)  ^  CALL  FF11(M1,M2,LLI, 
XLL2,3L^D,2) 
CALL  UNNAME 
RETURN 
END 

[end] 
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INTPODUrTTON 

SO&PSUD'^  IS  SCHWARTZ'S  OWN  ATHFNF  PROCESSOR.  SERIAL  UNIPROCESSOR 
DEBUGGING  SIMULATOR.   IT  IS  A  PROGRAM  CODED  FOR  THE  CDC  6600  AND  DESIGNED 
TO  "SIMULATE  A  <;fT  0^  SIXTY  CENTRAL  PROCESSING  UNITS,  SIMILAR  TO  THE 
CDC  6600,  OPERATING  IN  PARALLEL  FROM  A  COMMON  MEMORY.   IN  ADDITION  TO 
THE  INSTRUCTION  RC^PFRTOIRE  OF  THE  CDC  6600,  A  NUMBER  OF  INSTRUCTIONS 
SUITABLE  FOR  A  MULT  I -PROCESSOR  ENVIORNMENT  HAVE  BEEN  IMPLEMENTED. 

THE  SIMULATOR  HAS  BEEN  DESIGNED  FOR  DEBUGGING  AND  ESTIMATING  THE 
EFFICIENCY  OF  MULT  I -PROCESSOR  PROGRAMS*  AND  FOR  STUDYING  THE  CHARACTERISTICS 
OF  LARGE  SCALE  MULT  I -PROCESSOR  SYSTEMS,  IN  ORDER  TO  OBTAIN  QUANTITATIVE 
DATA  FOR  THF  DESIGN  OF  FUTURE  SYSTEMS  OF  THIS  TYPE.   TO  THIS  END.  A  NUMBER  OF 
OPTIONS  ARE  AVAILABLE,  FOR  TRAOPING  AND  TRACING  INSTRUCTIONS.  CHECKING 
THE  VALUES  OF  SPECIFIED  LOCATIONS.  AND  MEASURING  THE  RUNNING  TIMES  OF 
ROUTINES, 
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THE  SOAPSUDS  PROGRAM  IS  DESIGNED  TO  SIMULATE  A  CONFIG'.'RAT  I  ON 
OF  CENTRAL  PROCESSING  UNITS  KNOWN  AS  ATHENE  1.   THIS  PROCESSING  SYSTEM 
WAS  DESIGNED  BY  PROF.  JACK  SCHWARTZ t  WHO  COINED  THE  TERM  ATHENE 
PROCESSOR  (1). 

THE  TERM  ATHENE  PROCESSOR  REFERS  TO  A  SET  OF  CENTRAL  PROCESSING 
UNITS.  EACH  WITH  ITS  OWN  INSTRUCTION  LOCATION  COUNTER.  OPERATING  FROM 
A  COMMON  MEMORY,   SOAPSUDS  PROVIDES  FOR  THE  SIMULATION  OF  UP  TO  SIXTY 
PROCFSSOPS, 

EACH  PROCESSOR  IS  SIMILAR  TO  THE  CENTRAL  PROCESSING  UNIT  OF  THE 
CDC  6600,   IT  HAS  24  ARITHMETIC  REGISTERS  (THE  A*  Rt  AN-^  X  REGISTER'S). 
AND  ALL  THE  INSTRUCTIONS  WHICH  ARE  ON  THE  6600.  USING  THE  SAME  OPCODES 
AS  THE  6600.   EACH  PROCESSOR  HAS  ITS  OWN  INSTRUCTION  LOTATION  COUNTER 
AND  UPPER  AND  LOWER  MEMORY  BOUNDS  (A  PROCESSOR  CANnjOT  LOAD.  STORE.  OR 
EXECUTE  CODE  OUTSIDE  ITS  OWN  M'MORY  BOUNDARIES).   IN  ADDITION.  EACH 
PROCESSOR  HAS  A  SPECIAL  REGISTER.  KNOWN  AS  THE  ASSIGNED  BADGENUMBER 
REGISTER.  WHICH  MAY  BE  READ  BY  THE  PROCESSOR  WITH  ONE  0^  THE  EXTRA 
INSTRUCTIONS,  READ  BADGENUMBER.   THE  BADGENUMBER  REGISTER  ''AN  BE  SET 
ONLY  BY  A  SPECIAL  SUPERVISORY  COMPUTER.  WHOSE  SIMULATION  HAS  NOT  BEEN 
IMPLEMENTED  ON  THE  PRESENT  SOAPSUDS  SYSTEM. 

THE  INSTRUCTIONS  WHICH  HAVE  BEEN  ADDED  TO  THE  66n0  REPERTOIRE  ARE 

1.  READ  BADGENUMBER   (  RBN    BI) 
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THIS  INSTRUCTION  PLACES  THE  CONTENTS  OF  THE  AS-^IGNED 
BADGENUMBER  REGISTER  IN  B  REGISTER  NUMBER  I.   ^F  1=0. 
THE  INSTRUCTION  IS  A  NO-OP. 

THE  OPERATION  CODE  IS  001.  AND  THE  INSTRUCTION  IS  30  BITS 
LONG.   THE  REGISTER  NUMBER  APPEARS  IN  BIT*;  l-S-l?  (LOW 
ORDER  BIT  =  BIT  0). 

2.  REPLACE  ADD   (RAD    XI.BJ+K) 

-OOIJAAAAAA- 


THIS  INSTRUCTION  STORES  THE  60-BIT  ONE • S-COMPLFMENT  SUM  OF 
REGISTER  XI  AND  THE  CONTENTS  OF  LOCATION  RJ+<  ROTH  IN  REGISTER 
XI  AND  AT  LOCATION  BJ+K.   IF  TWO  OR  MORE  PROCESSORS 
SIMULTANEOUSLY  REACH  A  REPLACE  ADD  INSTRUCTION  REFERENCING 
THE  SAME  LOCATION.  THE  PROCESSORS  WILL  EXECUTE  THE  INSTRUCTION 
SERIALLY  (IN  ORDER  BY  PROCESSOR  NUMBER).   REPLACE  ADD 
INSTRUCTIONS  CAN  ONLY  BE  PERFORMED  WITH  RFGISTFRS  X6  AND  X7. 
SO  THE  OPERATION  CODES  ARE  006  (REdlACE  ADD  X61  AND  007 
(REPLACE  ADD  X7).   THE  INSTRUCTION  IS  30  ^ITS  LONG.   THE 
ADDRESS  PORTION  K  (AAAAAA)  IS  IN  BITS  0-17.  AND  THE  NUMBE'^  OF 
THE  B  REGISTER  (J)  ADDED  TQ  K    TO  COMPUTE  ^HE  EFFECTIVE  ADDRESS 
IS  GIVEN  IN  BITS  18-?0 

3.  EXIT   (XIT   BJ+K) 
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IN  ADDITION  TO  THE  REGISTERS  DESCRIBED  ABOVE.  EACH  CPU  HAS  A 
TOGGLE  KNOWN  AS  THE  PROGRAM/RESIDENT  MODE  INDICATOR.   THIS 
INDICATOR  AFFECTS  ONLY  THE  EXECUTION  OF  ONE  INSTRUCTION.  THE 
EXIT  INSTRUCTION.  TO  BE  DESCRIBED  «ELOW. 
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ALL  ThP  processors  SHARE  IN  COMMON  A  SET  OF  3  ADDRESSES. 
THE  RESIDENT  UPPER  LIMIT,  THE  RESIDENT  LOWER  LIMIT*  AND  THE 
RESIDENT  EXIT  ADDRESS,   WHENEVER  THE  PROGRAM/RFS IDENT  MODE 
INDICATOR  IS  IN  PROGRAM  MODE ♦  AND  AN  EXIT  INSTRUCTION  IS 
EXECUTED,  THE  UPPER  AND  LOWER  MEMORY  LIMITS  FOR  THE  PROCESSOR 
ARE  SET  TO  THE  RESIDENT  MEMORY  LIMITS.  THE  PROCESSOR  TRANSFERS 
CONTROL  TO  THE  RESIDENT  EXIT  ADDRESS.  AND  THE  INDICATOR  CHANGES 
TO  PROGRAM  MODE.   ON  THE  OTHER  HANH.  IN  RESIDENT  MODE.  THE 
CONTENTS  OF  LOCATION  BJ  +  »^  IS  USED  TO  DETERMINE  THE  NEW  UPPER 
AND  LOWER  MEMORY  LIMITS  AND  P  REGISTER.   THE  HIGH  20  BITS 
(40-59)  <;oECIFY  THE  ADDRESS  TO  WHICH  CONTROL  IS  TRANSFERRED. 
AND  THE  MIDDLE  20  BITS  (20-39)  AND  LOW  20  BITS  (0-19)  BECOME 
THE  LOWER  AND  UPPER  MEMORY  LIMITS.  RESPECTIVELY.   THE  INDICATOR 
IS  THEN  SET  TO  PROGRAM  MODE.   THUS  REPEATED  EXECUTION  TOGGLES 
THE  PROCESSOR  BETWEEN  RESIDENT  AND  PROGRAM  MODES. 

THIS  INSTRUCTION  WAS  DESIGNED  TO  FACILITATE  THE 
IMPLEMENTATION  OF  AN  OPERATING  SYSTEM  UNDER  SOAPSUDS.   USING 
THIS  INSTRUCTION.  MEMORY  LIMITS  CAN  BE  ALTERED  AS  NEEDED.  SO 
THAT  ONE  PROGRAM  IN  THE  SYSTEM  WILL  BE  PREVENTED  FROM  WRITING 
OVER  ANY  OTHER  PROGRAM,   FURTHERMORE.  THE  CODE  WHICH  DETERMINES 
THE  NEW  LIMITS  IS  IN  A  SPECIALLY  PROTECTED  AREA  OF  MEMORY,  SO 
NO  PROGRAM  MAY  ACCIDENTALLY  ALTER  ITS  MEMORY  LIMITS  TO  INCLUDE 


■71- 


THF  SOAPSUDS  SYSTEM  —  GENERAL  DESCRIPTION 

A  MAJOR  SECTION  OF  SOAPSUDS  IS  AN  OFFSHOOT  OF  WATCHER*  A  DEBUGGING 
AID  FOR  THE  6600  WHICH  SIMULATES  THE  6600  CENTRAL  PROCESSOR ( 2 ) .   SOAPSUDS. 
LIKE  WATCHER.  USES  THE  PROGRAM  TO  BE  SIMULATED  AS  DATA.  ANALYZING  THE 
INSTRUCTIONS  AND  PERFORMING  THE  OPERATIONS  THEY  SEOUEST.   A  BLOC<  IN 
MEMORY  IS  RESERVED  TO  HOLD  THE  SIMULATED  REGISTERS  FOR  EACH  PROCESSOR. 

IT  TAKES  ROUGHLY  80  TIMES  AS  LONG  FOR  SOAPSUD-^  TO  SIMULATE  AN 
INSTRUCTION  Ab  IT  DOES  FOR  THE  6600  TO  ACTUALLY  EXECUTE  AN  INSTRUCTION. 
HENCE.  IF  ALL  60  SIMULATED  PROCESSORS  ARE  R;..'NNING.  IT  T^KE^  ABOUT  60*80 
TIMES  AS  LONG  FOR  SOAPSUDS  TO  SIMULATE  ONE  INSTRUCTION  FOR  ALL  THE 
PROCESSORS  AS  IT  DOES  FOR  THE  6600  TO  EXECUTE  ONE  INSTRUCTION.   THIS  IMPLIES 
A  MAXIMUM  SPEED  ON  THE  ORDER  OF  ONE  INSTRUCTION  PER  MILLISECOND.  1000 
INSTRUCTIONS  PER  SECOND  PER  COMPUTER, 

INCLUDED  IN  SOAPSUDS  IS  A  TRAPPING  SYSTEM  WHEREBY.  WHEN  A  SPECIFIED 
OCCURANCE  TAKES  PLACE   SOAPSUDS  TEMPORARILY  STOPS  SIMULATION  AND  TRANSFERS 
CONTROL  TO  A  TRAP  ROUTINE.   TRAPS  MAY  OCCUR  ON  THE  EXECUTION  OF  A 
PARTICULAR  OPCODE  BY  A  PARTICULAR  PROCESSOR.  ON  TR^'NSFE'?  OF  CONTROL  TO 
A  PARTICULAR  LOCATION.  OR  ON  A  LOAD  FROM  OR  <^TORE  TO  A  ^ARTICULAR 
LOCATION,   NOTE  THAT  CONTROL  IS  ACTUALLY  TRANSFERRED  TO  THE  TRAP  ROUTINE. 
I.E..  THE  ROUTINE  IS  EXECUTED  DIRi^CTLY  BY  THE  6600.  NOT  SIMULATED  BY 
SOAPSUDS, 

ALONG  WITH  PROVISIONS  FOR  TRAPPING.  SOAPSUD*^  HAS  A  COMPLETE  SET  OF 
TRACES,   ANY  EVENT  WHICH  MAY  BE  TRAPPED  ON  MAY  ALSO  BE  TRACED.   IN 
SOAPSUDS.  TRACING  IS  SIMPLY  A  SPECIAL  TRAP.  WHICH  TRA^S  TO  A  ROUTINE 
THAT  PRINTS  A  MESSAGE  (TRACE).   ROUTINES  IN  SOAPSUDS  TURN  ON  AND  OFE  TRAPS 
AND  TRACES.  AND  SUSPEND  THEM  TEMPORARILY, 

IN  ADDITION  TO  TRAPPING  AND  TRACING.  THERE  ARE  TWO  OPTIONS  IN 
SOAPSUDS  FOR  CHECKING  THE  CONTENTS  OF  SPECIFIED  LOCATIONS.   ONE  OF  THE 
ROUTINES  CORRECTS  THE  CONTENTS  OF  THE  LOCATION  IF  IT  IS  INCORRECT.  THE 
OTHER  DOES  NOT,    THESE  TWO  ROtJTINE*^  CHECK  THE  VALUES  0!^  THE  LOCATIONS 
EACH  TIME  A  STORE  IS  MADE  INTO  THF  LOCATION. 


-72- 


DRiv/iT'^   MEMORY 

SOAPSUDS  HAS  A  BUILT  IN  MEMORY  FEATURE  WHICH  ENABLES  THE  WRITING 
OF  PARALLEL  PROCEDURES  IN  FORTRAN  WITH  RELATIVE  EASE—  THE  ABILITY 
OF  HANDLING  LOADS  AND  STORES  TO  A  USER  DEFINED  REGION  OF  MEMORY  AS 
BEING  PRIVATE  TO  THF  PR0CE«^S0R  THAT  EXECUTED  THE  LOAD  OR  STORE. 
I.E.»  EACH  PROCESSOR  HAS  A  UNIQUE  LOCATION  ASSOCIATED  WITH  THE  ADDRESS  OF 
THE  LOAD  OR  STORE  WHICH  CONTAINS  THE  ACTUAL  WORD  THAT  I «:  BEING  LOADED  OR 
STORED,    ONE  SEES  IMMi^DIATELY  THATt  IE  THE  ROUTINES  THAT  ARE  TO  BE  OP- 
ERATED IN  PARALLEL  ARE  PLACED  IN  THF  PRIVATE  MEMORY  REGION,  THE  COMP- 
ILER GENERATED  TEMPORARY  STORES  AND  RE<:TORES  AND  THE  CALLS  TO  THESE 
ROUTINES  ARE  CORRFCTLY  HANDLED  AUTOMATICALLY.    FURTHERr^ORE  ♦  DO  LOOPS 
CAN  BE  EXECUTED    COMPLETELY  IN  PARALLEL  WITH  EACH  C^u  HAVING  ITS  OWN 
VALUP  01^  THP  no  VARIAOLE. 

THUS,  FOR  EXAMPLE,  IF  NEWVAL  IS  A  FUNCTION  THAT  INCREMENTS  ITS  PARAMETER 
BY  1  IN  A  SERIAL  MANNER  (ONLY  ONE  CPU  AT  A  TIME),  J  IS  DEFINED  AS  A 
♦PUBLIC*  VARIABLE  (I.E.,  LOADS  AND  STORES  REFERENCING  J  ARE  PERFORMED  IN 
THE  U<^UAL  MANNER),  JJ  IS  DEFINED  A<^  A  tPRIvATE*  VAR  IABL'=' (  I N  THE  REGION 
OF  MEMORY  DESIGNATED  AS  PRIVATE)  AND  J  IS  INITIALLY  SET  TO  0,  THE  FOLL- 
OWING CODING  WILL  GIVE  EACH  CPU  EXECUTING  IT  A  DIFFERENT  VALUE  OF  JJ 
UNTIL  THE  CPU'S  JJ  EXCEEDS  N,  AT  WHICH  TIME  THE  CPU  IS  SHUNTED  OFF  TO 
<;tatp'mc-mt  10'^, 

1     JJ  =  Nc-WVAL(J) 

TF  (JJ  ,GT,  N  )  GO  TO  100 

(PPOGRAM) 

r,n    TO  1 

IN  PRACTICE,  THE  VARIABLES  AND  ARRAYS  THAT  ARE  TO  BE  PUBLIC  (!.£•• 
THOSE  VARIABLES  THAT  ARE  REFERENCED  BY  THE  CPU-S  IN  THE  ORDINARY  WAY) 
ARE  PLACED  IN  BLANK  OR  NUMBERED  ^OMMON ,  AS  THESE  ARE  ASSIGNED  TO  THE  END  OF 
MEMORY,  WHILE  PRIVATE  VARIABLES  OR  ARRAYS  ARE  ASSIGNED  IMPLICITY  OR  BY 
DIMEN<=ION  OR  LABLFD  COMMON  STATEMENTS. 

DESIGNATING  THE  MEMORY  REGIONS 

IN  ADDITION  TO  THE  PRIVATE  REGION  DESCRIBED  ABOVE,  SOAPbUDS  RE- 
OUIRES  The  SPECIFICATION  IF  TWO  AUXILLIARY  REGIONS — 

A.  THE  PRIVATE  EXECUTION  REGION  DESIGNATING  THE  AREA  IN 
WHICH  LOADS  AND  STORES  TO  THE  PRIVATE  REGION  ARE  HANDLED  AS  ABOVE  (THE 
COMPLEMENT  OF  THIS  REGION  ALLOW<^  ORDINARY  LOADING  AND  STORING  INTO  THE 
PRIVATE  AREA  FOR  «^UCH  PURPOSES  AS  RESETTING  CODING  LINE<^,  ETC.) 

B,  THE  STORAGE  REGION  IN  WHICH  SOAPSUDS  KEEPS  THE  PRIVATELY 
STOPFr>  !^^FORMATI0^U 

THE  REGIONS  OF  MEMORY  ARE  DEFINED  BY  A  THREE  ELEMENT  ARRAY 
WHOSE  ADDRESS  IS  PROVIDED  BY  THE  SIXTH  PARAMETER  O"^  THE  SOAPSUDS  CALL. 
EACH  ELEMENT  OF  THE  ARRAY  CONTAINS  TWO  ADDRESSES, THE  LOWER  BOUNDf  OF 
THE  AREA  IT  DESIGNATES,  IN  BITS  47-30  AND  THE  UPPER  BOUND  IN  BITS  17-0. 
THE  FIRST  ELEMENT  DESIGNATES  THE  PRIVATE  MEMORY  BOUNDS,  THE  SECOND,  THE 
BOUNDS  OF  THE  PRIVATE  EXECUTION  AREA  AND  THE  THIRD,  THE  BOUNDS  OF  THE 
STORAGE  AREA, 

SO  LONG  AS  All  INSTUCTIONS  in  the  program  are  WITHIN  THE  SECOND 
PAIR  OF  LIMITS  ('WITHIN  WHICH  STORES  TO  PRIVATE  MEMORY  ARE  TREATED  Ab 
PRIVATE  STORES'),  SOAPSUDS  COMPLETELY  SIMULATES  THE  EXISTENCE  OF  PRIVATE 
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MEMORY  REGIONS*  AND  THE  USER  NEED  NOT  CONCERN  HlyiSELF  WITH  THE  METHOD 
OF  SIMULATION.   HOWEVER.  SIMULATED  MEMORY  ACCESSES  OUTSIDE  THESE  LIMITS 
WILL  NOT  PRESERVE  THE  ILLUSION.  SO  IT  IS  POSSIBLE.  FOR  '^XAmplE.  FOR 
ONE  PROCESSOR  TO  STORE  INTO  ANOTHER  PROCESSOR'S  PRIVATE  MEMORY.   A 
PROCESSOR  OUTSIDE  THESE  LIMITS.  LOADING  A  VORD  IN  "^HE  PRIV'TE  REGION. 
MUST  CHECK  WHETHER  THE  HIGH  42  BIT<:  ARE  0020221126^124.   IF  THEY  ARE. 
THE  SIMULATED  MEMORY  WORDS  CORRESPONDING  TO  THAT  ADDRESS  ARE  CONTAINED 
IN  AN  ARRAY  WHOSE  LOCATION  IS  GIVEN  BY  THE  LOW  18  °IT5.   THE  LENGTH 
OF  THE  ARRAY  IS  THE  NUMBER  OF  PROCESSORS.  AND  THE  dOSITION'^  IN  THE  ARRAY 
CORRESPOND  TO  THE  PROCESSOR  NUMBERS  (THE  FIRST  ELEl^ENT  IS  PROCESSOR 
ZERO'S  PRIVATE  MEMORY  WORD.  THE  <ECOND  IS  PROCESSOR  ONE'S.  ETC.).   LOADS 
AND  STORES  MUST  IN  GENERAL  BE  MADE  TO  THIS  ARRAY — A  STORE  INTO  THE 
PRIVATE  REGION  WiLL  IN  EFFECT  SET  THE  LOCATION  TO  THE  SAME  VALUE  FOP  ALL 
PROCESSORS. 

USERS  WHO  MAKE  CALLS  FROM  OUTSIDE  THE  SECOND  LIMITS  MUST  BE 
ESPECIALLY  CAUTIOUS  TO  CHECK  WHETHER  ANY  OF  THE  ARGUMENTS  IS  A  POINTER 
TO  A  PRIVATE  MEMORY  ARRAY  (IN  WHICH  CASE  THE  ACTUAL  ADDRESS  IN  THE  ARRAY 
MUST  BE  SUPSTITUTPD),   FOR  CALL<:  WITHIN  THESE  LIMlTC,,  SOAP^^UDS  TAKES 
CARE  OF  ALL  NECESSARY  ARRANGEMENTS. 

FOR  NORMAL  CODING  (WITHIN  THE  SECOND  PAIR  OF  LIMITS)  THE  USER 
NEED  ONLY  MAKE  SURE  HE  ALLOCATES  SUFFICIENT  SPACE  ^^OR  SOAPSUDS  TO  KEEP 
THE  PRIVATE  MEMORv  ARRAYS  (BETWEEN  THE  THIRD  PAIR  OF  LIMIT^).   THE 
MINIMUM  NUMBER  OF  LOCATIONS  I  <;  THE  NUMBER  OF  ADDRESSE"^  IN  THE  PRIVATE 
REGION  INTO  WHICH  STORES  ARE  MADE  ( VAR I ABLES+TEMPORAR lES+I NDI RECTS ) 
TIMES  THE  NUMBER  OF  PROCESSORS.   IF  INSUFFICIENT  SdACE  IS  ALLOTED* 
SOAPSUDS  WILL  ERROR  EXIT  WHEN  THE  -^PAC^  IS  f^XCEEDED  (SE"^  ERROR  EXITS. 
BELOW). 
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USING  THF  SIMULATOR  —  RASIC  INFORMATION 

SOAPSUDS  IS  AVAILABLE  AS  A  RELOCATABLE  CHIPPEwA  BINARY  DECK.   THE 
SOAPSUDS  PACKAGE  CONSISTS  OF  THE  MAIN  SOAPSUDS  BINARY  DECK  AND  A  SET  OF 
CONNECTING  SUBROUTINESt  IN  ASCENTF   (SYMBOLIC).   THE  PROGRAM  USING 
SOAPSUDS  NEVFR  CALLS  SOAPSUDS  DIRECTLY.   INSTEAD,  ALL  CALLS  ARE  MADE 
THROUGH  ONE  OF  THE  CONNECTING  SUBROUTINES.   THE  COMPLETE  SOAPSUDS 
PACKAGE  MAY  BE  INCLUDED  IN  A  DECK  WITH  THE  PROGRAM  BEINo  TESTED  AS 
REGULAR  BINARY  AND  SYMBOLIC  SUBROUTINES. 

ENTRY 

SINCE  THE  USER  SUPPLIES  THE  MAIN  PROGRAM*  THE  PROGRAM  IS  INITIALLY 
NOT  UNDER  SOAPSUDS  SIMULATION.   TO  BEGIN  SIMULATION.  THE  FOLLOWING  CALL 

MUST  BE  EXECUTED 

CALL   SOAPSDS(NUMro,LOLIM.HILIM,START,XTLOC.MMBND) 

WHERE   NUMCP   IS  THE  TOTAL  NUMBER  OF  PROCESSORS  TO  BE  SIMULATED 

LOWLIM   IS  THE  INITIAL  LOWER  MEMORY  LIMIT  FOR  ALL  PROCESSORS.  AND 

THE  RESIDENT  LOWER  MEMORY  LI^IT 
HILIM   IS  THF  INITIAL  UPPER  MEMORY  LIMIT  FOR  ALL  PROCESSORS.  AND 

THE  RESIDENT  UPPER  MEMORY  LIMIT 
START   IS  THE  LOCATION  AT  WHICH  ALL  PROCESSORS  BEGIN  EXECUTION 
XTLOC    IS  THE  LOCATION  TO  WHICH  CONTROL  IS  TRANSFERED  ON  THE 

EXECUTION  OF  AN  XIT  INSTRUCTION  BY  A  PROCESSOR  IN  PROGRAM 
MODE  (THE  RESIDENT  EXIT  ADDRESS) 
MMBND    IS  THE  LOCATION  OF  THE  THREE  ELEMENT  ARRAY  THAT  SETS  THE 
PRIVATE  MEMORY  BOUNDS  (SEE  DISCUSSION  ABOVE) 
IF  NUMCP  IS  ZERO.  ONE  PROCESSOR  WILL  BE  SIMULATED. 

THE  ASSIGNED  BADGENUMBER  OF  EACH  PROCESSOR  IS  SET  TO  THE  ABSOLUTE 
BADGENUMBER( I.E..  THE  NUMBER)  OF  THE  PROCESSOR.   THUS  PROCESSOR  ZERO 
GETS  AN  ASSIGNED  BADGENUMBER  OF  ZERO.  PROCESSOR  ONE  AN  ASSIGNED  BADGE- 
NUMBER  OF  ONE*  FTC.   ALL  ARITHMETIC  REGISTERS  START  OUT  ZERO, 

ANv  ATTEMPT  TO  LOAD.  STORE  OR  TRANSFER  CONTROL  TO  THE  SOAPSUDS 
PROGRAM  AREA  WILL  CAUSE  AN  OUT  OF  BOUNDS  ERROR.    AFTER  SOAPSOS  HAS  BEEN 
ANY  FURTHER  CALLS  TO  SOAPSDS  WILL  RESULT  IN  AN  ERROR  EXIT. 
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AUTOMATIC  CHECKS  DURING  SIMULATION 

THE  FOLLOWING  EVENTS  ARE  CHECKED  FOR  DURING  SIMULATION  AND 
ARE  CONSIOERFO  EROOP  CONDITIONS 

1.  PROGRAM  STOP  (EXIT  MODE  0) 

2  LOAD  OR  STORP  OUT  OF  BOUNDS  (EXIT  MODE  1) 

3.  TRANSFER  OF  CONTROL  TO  A  LOCATION  OUTSIDE  MEMORY  BOUNDS 
(EXIT  MODE  2) 

4.  EXECUTION  OF  A  30-BIT  OPCODE  IN  THE  LOW  QUARTER  OF  A  WORD 
(•EXECUTING  DATAM  (EXIT  MODE  4) 

5.  INSUFFICIENT  SPACE  FOR  PRIVATE  MEMORY  ARRAYS  (EXIT  MODE  8) 

6.  END  IN  RA+1  (EXIT  MODE  16) 

7.  ABT  IN  RA+1  (EXIT  MODE  32) 

8.  ERROR  IN  A  CALL  TO  A  SOAPSUDS  OPTION  (EXIT  MODE  64) 

ON  ENCOUNTERING  ANY  OF  THESE  CONDITIONS*  EXCEPT  FOR  A  PROGRAM  STOP, 
SOAPSUDS  WILL  PRINT  OUT  A  SNAP  OF  THE  PROCESSOR  AND  A  S">APSNAP,  AND, 
IF  THE  RESIDENT  ExIT  ADDRESS  (THE  FIFTH  PARAMETER  IN  TH"^  CALL  TO 
SOAPSDS)  IS  ZERO*  WILL  ABORT.   IF  IT  IS  NOT  ZERO,  aND  T^^E  PROCESSOR 
IS  IN  RESIDENT  MODE.  SOAPSUDS  WILL  ALSO  A80PT.   HOWEVER,  IF  IT  IS 
NON-ZERO  AND  THE  PROCESSOR  IS  IN  PROGRAM  MODE,  SOAPSUDS  WILL  SET  B7  TO 
3,  WILL  CLEAR  XO  AND  50TRE  THE  CURRENT  P  REGISTER  IN  BI^S  30-47 
AND  THE  EXIT  MODE  IN  BITS  48-59  OF  XO,  AND  WILL  XIT  TO  THE  RESIDENT. 

IN  CASE  OF  A  PROGRAM  STOP,  WITH  A  ZERO  RESIDENT  EXIT  ADDRESS, 
SOAPSUDS  WILL  SIMPLY  STOP  THE  PROCESSOR,  PRINTING  NO  ME^^SAGE  UNLESS  A 

TRACE  OF  OPCODE  00  WAS  REQUESTED.   IF  THE  RFSIDENT  EXIT  ADDRESS  IN 
NON-ZERO.  THE  USUAL  EXIT  PROCEDURE  WILL  BE  FOLLOWED  (ALTHOUGH  NO  SNAP 
(5f?  SOAPSNAP  WILL  APPEAR), 
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CALLING  SOAPSUDS  OPTIONS — AUTOMATIC  TRAPPING 

FOR  REASONS  OF  SIMPLICITY,  ^^OAPSUDS  OPTION-SETTING  ROUTINES  CANNOT 
BE  EXECUTED  UNDER  SIMULATION*  BUT  MUST  BF  EXECUTED  0I9ECTLY,  IN  A  TRAP 
(SEE  BELOW).   TO  RELIEVE  THE  USER  OF  THE  NECESSITY  OF  TRAPPING  EVERv 
TIME  AN  OPTION  IS  TO  BE  CALLED,  SOAPSUDS  AUTOMATICALLY  RECOGNIZES  A  CALL 
TO  A  SOAPSUDS  OPTION,  AND  GENERATES  A  TRAP  TO  THE  OPTION-SETTING  ROUTINE 
INSTEAD  OF  SIMULATING  THE  CALL  (THUS  EXECUTING  THE  ROUTINE  DIRECTLV). 
THE  ORDINARY  USER  NEED  NOT  BE  AWARE  OF  THIS  MECHANISM,  PUT  MERELY  KNOW 
THAT  ANY  SOAPSUDS  OPTION  CAN  BE  CALLED  BEFORE  INITIATING  SOAPSUDS, 
UNDER  SIMULATION,  OR  IN  A  TRAP  ROUTINE,   THE  ONLv  DIFFERENCE  IN  CALLING 
A  SOAPSUDS  OPTION  BEFORE  INITIALIZING  SOAPSUDS  IS  THAT,  IN  THE  EVENT 
OF  AN  IMPROPER  PARAMETER  IN  THE  CALL,  NO  MESSAGE  WILL  BE  PRINTED.   IN 
GENERAL,  AN  ERRONEOUS  CALL  WILL  BE  IGNORED,  AND  WILL  NOT  AFFECT  FURTHER 
USE  OF  THE  OPTION. 

A  NUMBER  OF  OTHER  SUBROUTINES  ARE  INCLUDED  IN  THE  LIST  OF 
SUBPROGRAMS  CALLS  TO  WHICH  ARE  TO  GENERATE  TRAPS.   THIS  LIST  INCLUDES 
THE  FORTRAN  I-O  ROUTINES,  WHICH  ARE  OF  COURSE  NOT  CODED  TO  PERMIT 
MANIPULATION  OF  THE  BUFFERS  BY  SEVERAL  PROCESSORS  SIMULTANEOUSLY.   OTHER 
ROUTINES,  BOTH  OF  THE  EXTERNAL  OPERATING  SYSTEM  ANO  OF  -^UCH  A  MULTI- 
PROCESSOR OPERATING  SYSTEM  AS  MAY  BE  USED,  CAN  BE  INCLUDED.   THE  LIST 
IS  IN  SYMBLOIC  (AS  A  LIST  OF  RcTURN  JUMPS)  AT  THE  END  0"^  THE  SOAPSDS 
CONNECTING  SUBROUTINE,  AND  SO  MAY  BE  MODIFIED  BY  THE  USER  AS  DESIRED. 
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U5^ING  THF  TRAPPING  ANO  TRATTNG  SYSTEM 

AS  WAS  MENTIONED  ABOVE.  SOAPSUDS  IS  PROVIDED  WITH  A  SYSTEM  FOR  TRAPPING 
AND/OR  TRACING  OPCODES.  LOADS.  STORES.  OR  THE  EXECUTION  OF  PARTICULAR 
INSTRUCTIONS. 

ALL  FOUR  TYPES  OF  TRAPS  WHICH  ARE  SET  AND  TURNED  OFF  AS  DESCRIBED 
IMMEDIATELY  BELOW  ALWAYS  TRAP  BEFORE  EXECUTING  (I.E..  SIMULATING)  THE 
IN<;TRUCTI0N  WHICH  CAUSED  THE  TRAP,    THUS.  AN  OPCODE  TRAP.  WHICH  CAUSES 
A  TRAP  WHENEVER  A  PARTICULAR  OPCODE  IS  EXECUTED  BY  A  PARTICULAR  CENTRAL 
PROCESSING  UNIT.  ALWAYS  OCCURS  BEFORE  THE  INSTRUCTION  WITH  THAT  OPCODE 
IS  EXECUTED.   A  CONTROL  TRAP.  WHICH  CAUSES  A  TRAP  WHENEVER  AN  INSTRUCTION 
AT  A  SPECIFIED  LOCATION  IS  EXECUTED.  ALWAYS  OCCURS  BEFORE  ANY  OF  THE 
INSTRUCTIONS  AT  THAT  LOCATION  ARE  EXECUTED  (SIMULATED).   LOAD  AND  STORE 
TRAPS.  WHICH  TRAP  ON  LOADS  FROM  OR  STORES  TD  PARTICULAR  LOCATIONS.  ALSO 
OCCUR  BEFORF  THF  LOAD  OP  STORE  TAICF<;  PLACE. 

A  FACILITY  FOR  TRAPPING  ON  A  STORE  AFTFR  THE  ^^TORE  HA«  TAKEN  PLACE 
IS  AVAILABLE  THROUGH  THE  CHECK  OPTION  (SEE  BELOW). 


SETTING  AND  TURNING  OFF  TRAPS 


TO  TURN  ON  A  TRAP, 


CALL   TRAP (TYPE. ADDRESS .TRAPAOR .CPU) 


WHERE   TYPE     IS  THE  TYPE  OF  TRAP  TO  BE  SET 
al  FOR  AN  OPCODE  TRAP 
=?  FOR  A  CONTROL  TRAP 
=3  FOR  A  LOAD  TRAP 
=6  FOR  A  STORE  TRAP 
ADDPF<^^  IS  THF  NUMBER  ON  WHICH  THE  TRAP  IS  TO  OCCUR 
FOR  AN  OPCODE  TRAP.  THE  OPCODE 

FOR  A  CONTROL  TRAP,  THE  LOCATION  OF  THE  INSTRUCTION 
FOR  A  LOAD  TRAP.  THF  LOCATION  FROM  WHICH  THE  LOAD  IS  MADE 
FOR  A  STORE  TRAP.  THE  LOCATION  TO  WHICH  THE  STORE  IS  MADE 
TRAPAOR  IS  THE  ADDRESS  TO  WHICH  THE  TRAP  IS  MADE  WHEN  THE  TRAP 

CONDITION  IS  MET.   <^OAPSUDS  PERFORMS  A   RJ   TRAPADR   WHEN 
IT  TRAPS.  50  THE  LOCATION  TRAPADR  SHOULD  BE  BLANK.  AND  THE 
FIPST  EXECUTABLE  INSTRUCTION  OF  THE  TRAP  ROUTINE  SHOULD  BE 
AT  TPAPADO+1, 
CPU       IS  ONLY  APPLICABLE  ON  OPCODE  TRAPS.   IT  INDICATES  THE 

NUVBFR  OF  THE  PROCESSOR  TO  WHICH  THE  OPCODE  TRAP  APPLIES. 
ADOITTONAL  N'^TF? 

1.  TO  INDICAT«^  »FVERY»  (EV^^RY  OPCODE.  EVERY  ADDRESS.  OR  EVERY 
PROCESSOR),  U<^E  ThE  CONSTANT  -64  (777677B).      FOR  EXAMPLE.  TO  TRAP  TO 
ABSURD  PEFORf^  THE  EXECUTION  OF  EVERY  INSTRUCTION, 

CALL   Tf?AP(  1  .-64,AB«;URD,-64) 

2.  ANY  «^VFNT  MAY  CAUSE  AT  MOST  ONE  TRAP  OF  EACH  TYPE.   IF  TWO  TRAPS 
OF  THE  SAME  TvpF  ON  THF  -^AME  EVENT  ARE  SET,  THE  ONF  SET  FIRST  IS  THE  ONE 
WHICH  WILL  ""^^    EXECUTED.   HOWEVER.  AN  »EVERY»  TRAP  (A  TRAP  ON  EVERY 
OPCODE  FOR  "^VERY  PROCESSOR,  OR  ON  EVERY  LOCATION)  HAS  PRECEDENCE  OVER  ALL 
OTHER  TRAPS, 

3.  IT  I<^  POS<;iBLF  FOR  ONF  INSTOfJCTION  TO  LEAD  TO  SEVERAL  TRAPS 
(THOUGH  ONLY  ONF  OF  EACH  TYPE),   IF  AN  INSTRUCTION  CAUS'^S  MORE  THAN  ONE 
TRAP,  THE  CRDER  OF  TRAPS  WILL  ALWAYS  BE   FIRST.  CONTROL  TRAP.   SECOND. 
OPCODF  TRAP,   THIRD.  LOAD  0?  STORE  TRAP. 

4.  TRAPADR  I<:  NOT  CHECKED  IN  ANY  WAY, 

5.  CPU  NEED  NOT  BE  GIVEN  FOR  TRAPS  OTHER  THAN  OPCODE  TRAPS. 
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TO  TURN  OFF  A  TPAd, 

TALL   TRAPOFF(TYPF,AOnRF';<;fO»CPtn 

WHFRF  TVPF,  Ar>DRE<^S  ANP  CPU  ARF  THF  SAME  VALUES  V/HICH  W«^RE  USED  IN 
CALLING  TPAD  wHCN  THF  tpap  wa<;  TURNED  ON. 
AnriITTn\iAL  Nf^TF*; 

1.  AS  FOR  TRAP,  B4  NEED  NOT  RE  SET  FOR  TRAPS  OTHER  THAN  OPCODE  TRAPS, 

2.  IF  TWO  TRAPS  HAVE  THF  SAME  VALUES  OF  TYPE,  ADDRESS  (AND  CPU,  IF 
OPCODE  TRAPC),  THF  F I R «:  T  ONE  WHICH  WAS  SET  WILL  RE  TURN'^D  OFF. 


TRAPS  SFRVE  ^EVFRAL  PA<;IC  OURPOSE*:.   THEY  MAY  BE  USED  TO  COUNT  THE 
FREQUENCY  OF  OPCODES,  LOADS,  OR  STORES.   TRAPS  MAY  BE  U'^ED  TO  TURN  ON  Oft 
OFF  OTHER  OPTIONS  AT  PARTICULAR  PLACES  IN  THE  PROGRAM,  OR  TO  CHECK  THE 
VALU"^*^  OF  '^DPCIAL  LOCA"^IONS  AT  PARTICULAR  I^'«;TANTS.    AND  THEY  MAY  BE  USED 
TO  PRINT  SPECIAL  MESSAGE^  IN  PLACE  QF  THE  STANDARD  TRACES  IF  DESIRED. 

USING  TRAP  ROUTINES  OFFERS  SEVERAL  ADVANTAGES.   FIRST,  CODING  WHICH 
WOULD  HAVE  TO  OCCijR  AT  A  LARGE  NUMBER  OF  POINTS  IN  A  PROGRAM  NEED  NOT  BE 
REPEATED   (AS  AN  FXTREME  EXAMPLE,  TO  DETERMINE  OPCODE  FREQUENCIES  WITHOUT 
OPCODE  TRAPS  WOULD  RECUIPE  A  ROUTINE  AFTER  i^VERY  INSTRU''TION  OF  THE  PROGRAM), 
SECOND,  SINC^  TRAo  ROUTINES  ARE  EXECUTED  AND  NOT  SIMULATED*  THEY  ARE 
MUCH  FASTER — IMDOPTANT  FOR  LEN'-^THY  orocEDUR-S  SUCH  AS  I-O.   SINCE  TRAP 
ROUTINES  ARE  NOT  "^iwiiLATED,  THE  MFWIQRY  BOUNf^S  OF  COURSE  DO  NOT  APPLY  TO  THEM, 

INt^ORMATlON  AVATLARL''  DURING  TRAPS 

WHEN  SOAP<;uDS  PERFORMS  A  TRAP,  CERTAIN  INFORMATION  IS  LEFT  IN  THE 
REGISTERS  WHICH  MAY  BE  USEFUL  DURING  THE  TRAP  ROUTINE.   THE  CONTENTS  OF  SOME 

OF  THE  Rf^GISTERS  UPON  TRAPPING  IS  LISTED  BELOW 

IN  R7  IS  THE  NUMBER  OF  THE  PROCESSOR  PRESENTLY' EXECUT I NG»  THE  ONE 

WHICH  CAUSED  THP  TRAP 
IN  85  IS  THE  OPCODE  OF  THE  INSTRUCTION  PRESENTLY  BEING  EXECUTED 
IN  Al  IS  THE  INSTRUCTION  LOCATION  COUNTER  OF  THE  PROCESSOR  PRESENTLY 

EXECUTING 
IN  81,  FOR  LOAD  AND  STORE  TRAPS,  IS  THE  LOCATION  TO  WHICH  THE  LOAD  OR 

STORE  IS  TO  BE  MADE 

EXITING  FROM  A  TRAP  ROUTINE 

NORMAL  EXIT  FORM  A  TRAP  ROUTINE  IS  JUST  LIKE  NORMAL  EXIT  FROM  ANY 
SUBROUTINE,  A  JUMP  TO  THE  FIRST  WORD  OF  THE  ROUTINE  (TRAPADR).   THIS  WORD 
WILL  CONTAIN  A  JUMP  BACK  INTO  THE  SOAPSUDS  SIMULATION  ROUTINES*   IF  A 
MESSAGE  IS  DESIRED  AT  THE  END  OF  THE  TRAP  ROUTINE  (THE  SAME  MESSAGE  AS  IS 
PRODUCED  BY  A  TRACE)  THE  USER  MUST  RETURN  TO  THE  LOCATION  AFTER  THE„0_NE 
IN  THE  JUMP  INSTRUCTION  AT  TRAPADR.   IN  OTHER  WOSDSt  HE  MUST  EXECUTE 

SAl    TRAPADR 

LXl    ?n 

<B1    XI 

JO     Bl+1 

SETTING  AND  TURNING  OFF  TRACES 

-79- 


AS  WAS  MENTIONED  ABOVE ♦- TRACES  IN  SOAPSUDS*  WITH  THE  EXCEPTION  OF 
OPCODE  TRACES.  ARE  ACTUALLY  SPECIAL  TRAPS  WHICH  TRiP  TO  ROUTINES  WHICH 
PRINT  OUT^  THE  REQUIRED  TRACES.   AS  A  RESULT,  THE  CALLING  SEQUENCES  FOR 
TURNING  ON'ANO  OFF  TRACES  ARE  VERY  SIMILAR  TO  THE  SEQUENCES  FOR  TURNING 
ON  AND  OFF  TRAPS. 
TO  TURN  ON  A  TRACE, 

CALL   TRACE (TYPE, ADDRESS, NAME, CPU) 

"ADDRESS,  AND  CPU  ARE  THE  <AME  VALUES  fiS  "^OR  A  CALL  TO  TRAP,  I, 
IS  THE  TYPE  OF  TRACE  TO  BE  SET 

=1  FOR  AN  OPCODE  TRACE 

=2  POR  A  CONTROL  TRACE 

«-?  FOR  A  LOAD  TRACP 

=4  FOR  A  STOPE  TRACE 
ADDRESS  IS  The  number  ON  WHICH  THE  TRACE  IS  TO  OCCUR 

FOR  AN  OPCODE  TRACE,  THE  OPCODE 

FOR  A  CONTROL  TRACE,  THE  LOCATION  OF  THE  INSTRUCTION 

FOR  A  LOAD  TRACE,  THE  LOCATION  FROM  WHICH  THE  LOAD  IS  MADE 

FOR  A  STORE  TRACE,  THE  LOCATION  TO  WHICH  THE  STORE  IS  MADE 
CPU      IS  ONLY  APPLICABLE  ON  OPCODE  TRACES.   IT  INDICATES  THE 

NUMBER  OF  THE  PROCESSOR  TO  WHICH  THE  QDC^DE  TRACE  APPLIES 
NAME     IS  APPLICABLE  TO  CONTROL,  LOAD,  A?4D  STORE  TRACES.   IT 
SHOULD  BE  THE  ADDRESS  OF  A  WORD  WHICH  CONTAINS,  IN  THE 
UPPER  (LEFT)  48  BITS  THE  NAME  OF  THE  LOCATION,  IN  DISPLAY 
CODE.     THESE  EIGHT  CHARACTERS  WILL  BE  PRINTED  ALONG  WITH 
THE  CONTROL,  LOAD,  OR  STORE  TRACE  PPINT-^UT.   THE  LOW- 
ORDER  (RIGHTMOST)  CHARACTER  IN  THE  WORD  SHOULD  INDICATE, 
FOR  LOAD  AND  STORE  TRACES,  THE  FORMAT  IN  WHICH  THE 
CONTENTS  OF  THE  WORD  BEING  TRACED  SHOULD  BE  PRINTED  OUT. 
THUS,  IF  THE  RIGHTMOST  CHARACTER  IS  AN  E,  THE  WORD  WILL 
BE  PRINTED  OUT  IN  E  FORMAT,  IF  AN  F,  IN  F  FORMAT,  IF  AN  A, 
IN  AlO  FORMAT,  IF  AN  L,  IN  LIO  FORMAT,  IF  AN  R,  IN  RIO 
FORMAT,  AND  IF  AN  I,  IN  117  FORMAT.   THE  LETTER  0  AND  MOST 
OTHER  LETTERS  WILL  CAUSE  PRINTOUT  IN  020  FORMAT, 
ADDITIONAL  NOTES 

1.  THE  INDICATION  FOR  «EVEPY«  IS  THE  SAME  AS  IT  IS  FOR  SETTING 
TRAPS,  -64. 

2.  ANY  EVENT  MAY  CAUSE  EITHER  A  TRACE  OR  A  TR'P  OF  ANY  GIVEN  TYPE, 
BUT  NO"t  BOTH.   TO  BOTH  TRACE  AND  TRAP,  SET  A  TRAP  AND  PRINT  A  TRACE 

BY  THE  MEANS  DESCRIBED   UNDER  'EXITING  FROM  A  TRAP',  ABOVE.     IF  A  TRACE 
AND  A  TRAP  WERE  SET  ON  A  GIVEN  LOCATION,  TH^^  ONE  WHICH  WAS  SET  FIRST  WILL 
BE  EXECUTED.   HOWEVER,  AN  'EVERY'  TRAP  OR  TRACE  HA<^  PRECEDENCE  OVER 
ALL  OTHER  TRAPS  AND  TRACES. 

3.  jONE  INSTRUCTION  MAY  PRODUCE  SEVERAL  TRAPS  AND  TRACES  (THOUGH  ONLY 

ONE  OF  EACH  TYPE).  THE  ORDER  IN  WHICH  THE  TRAPS  AND  TRACES  APPEAR  IS 
THE  SAME  AS  THE  ORDFR  FOR  A  COMBINATION  OF  TRAPS,  VIZ.,   FIRST,  CONTROL 
TRAP  OR  TRACE,   SECOND,  OPCODE  TRAP  OR  TRAC^,   THIRD,  LOAD  OR  STORE  TRAP 
OR  TRACE. 

4.  ON  CONTROL,  LOAD,  AND  STORE  TRAPS,  THERE  I  <^  A  LIMIT  OF  FIFTY 
TRACES  AND  TRAPS  OF  EACH  TYPE.   IF  AT  ANY  TIME  FIFTY  TRACES  AND  TRAPS 
OF    A  TYPE  ARE  SET,  ALL  CALLS  TO  TRAP  OR  TRACE  WILL  BE  IGNORED.   THE 
LIMIT  OF  TWENTY  ALSO  APPLIES  TO  OPCODE  TRAPS,  BUT  NOT  T<"  OPCODE  TRACES. 
OPCODE  TRACES  ARE  HANDELED  SPECIALLY,  SO  THAT  THERE  IS  NO  LIMIT  ON  THE 
DUMBER  OF  OPCODE  TRACES  WHICH  MAY  BE  SET. 

5.  WARNING   WHEN  TRACE  IS  CALLED,  SOAPSUDS  RECORDS  THE  ADDRESS  OF 
NAME,  BUT  DOES  NOT  COPY  ITS  CO.'JTENTS  INTO  A  REGION  INSIDE  SOAPSUDS.   THUS, 
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IN  ORDEP  ro  DP-sfPVE  THF  NAME  FOR  ^UTlJPt  PRINT  OUT,  THE  VARIABLE  'NAME' 
^HniJLD  NOT  RF  fiLT-Ri^n  6FT-P  THF  r/SLL  TO  TRATE. 

'MD^R     THE  SOAP'^^UD'^  -SYSTEM,  TURNINf^  OFF  A  TRACF  I  <;  lOGICALLY 
INOr>TINGUISHAPLE  FROM  TURNING  OFF  A  TRAP,   THE  EXACT  SAME  ARGUMENTS  TO 
TRAPOFF  SHOULD  BE  U<^FD  AS  WHEN  TURNING  OFF  A  TRACE.   THUS*  TO  TURN  OFF  A 
TRAfP , 

CALL   TPADOFFC  TYOE,AnDRFSS,0»CPU) 

WHERE  THF  VALUES  O-^  TYPE,  ADORES^;,  AND  CPU  ARE  THE  SAME  AS  WERE  USED  IN 
TURNING  ON  THE  TRACE.   IF  THERE  ARE  SEVERAL  TRAPS  ^ND/OR  TRACES  WITH  THE 
SAME  VALUES  OF  TYPE.  ADDRESS  (AND  CPU).  THE  FIRST  TRAP/TRACE  WHICH  WAS 
TURNED  ON  WILL  BE  THE  ONE  TURNED  OFF. 

TRAC"^  FORMATS 

THE  FORMATS  OF  THE  CONTROL.  LOAD.  AND  STORE  TRACES  ARE  QUITE  SELF- 
EXPLANATORY.   THEY  ARE  GIVEN  BELOW  FOR  REFERENCE 
STORE  TRACE 

PROCESSOR  23  AT  P=00721?  PERFORMED  A  STORE  FROM  X6  INTO  ARRAY  _( 013277 ) 
L0CN=77777777776777777777   (FORMERLY  0^111700000000004321) 

LOAD  TOACF 

PROCESSOR  23  AT  P=007642  PERFORMED  A  LOAD  FROM   BUFFER  (002226)  INTO 

X4  X4=01022400000000000000 

roNTooL  TpACF 

PROCESSOR  23  HAS  TRANSFERRED  CONTROL  TO  LOCATION   L00P2  (000212) 

THE  OPCODE  TRACES,  ON  THE  OTHER  HAND.  HAVE  A  NUMBER  OF  DIFFERENT  FORMATS 
FOR  THE  DIFFt^RENT  TYPES  OF  INSTRUCTIONS.   ALL  THE  OPCODE  TRACE  FORMATSt 
HOWEVER,  HAV"^  SOME  COMMON  CHARACTERISTICS.   AT  THE  LEI^T  SIDE  OF  THE 
OUTPUT  APPEAR  THREE  OCTAL  NUMBERS.   THE  FIRST*  A  SIX  DIGIT  NUMBER*  INDICATES 
THE  ADDRESS  OF  THE  INSTRUCTION  PRESENTLY  BEING  EXECUTED.   THE  SECOND,  A 
SINGLE  OCTAL  DIGIT,  INDICATES  THE  QUARTER  0"^  THE  WORD  IN  WHICH  THE  OPCODE 
APPEARS.  A  u    SIGNIFIES  THE  OPCODE  IS  IN  THE  HIGH  ORDER  PITS  (LEFTMOST 
QUARTER  OF  THE  WORD),  AND  A  1  SIGNIFIES  THE  OPCODE  IS  IN  THE  RIGHTMOST 
QUARTER  OF  THE  WORD.   THIRD  APPEARS  A  TWENTY  DIGIT  NUMBER,  THE  WORD  C0N-_ 
TAINING  THE  INSTRUCTION  PRESENTLY  BEING  EXECUTED.   AFTER  THESE  THREE  NUMBERS 
APPEARS  THE  vnemONICS  FOP  THE  INSTRUCTION  BEING  EXECUTED.   THEN  APPEAR_ 
THE  VALUES  OF  THE  ENTRY  OPERAND  REGISTERS,  AND  AT  THE  RIGHT  EDGE  OF  THE 
OUTPUT  IS  GIVEN  THF  RESULT  OF  THE  OPERATION.   FOR  THE  THREE  INSTRUCTIONS 
WHICH  HAVE  TWO  RE<:ULT  REGISTERS  (OPCODES  24-26)»  ONE  OF  THE  RESULT 
REGISTERS  APPEARS  IN  THE  OPERAND  COLUMN.   WHENEVER  A  S E T  A  LNATRU CTI ON 
IS  EXECUTED,  THE  CONTENTS  OF  THE  ASSOCIATED  X  REGISTER  IS  ALSO  PRINTED 
OUT,   ABOVE  EACH  OPCODE  TRACE  APPFARS  THE  LINE 

TRAC^"  OF  PROCE'^SOR  NUMBER  (ASSIGNED  BADGENUMBER)  23 
WHERE  THF  PROCESSOR  NU^^BER  (HERE  AND  IN  THE  TRACES  ABOVP )  IS  ALWAYS  IN  OCTAL, 
A  SAMPLE  OPCODE  TRACE  IS 

001071  4  6l4600725ft0612noi077    SB4    B6+7756    Bft=000072       B4=001171 

IN  GENERAL,  IN  SOAPSUDS  TRACES,  ADDRESSES  AND  THE  K     (CONSTANT)  PORTION 
OF  INSTRUCTIONS  ARE  ALWAYS  WRITTEN  OUT  IN  OCTAL.   REGISTERS  ARE  ALWAYS 
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WRITTEN  OUT  IN  OCTAL*  AND  ARE  ALSO  WRITTEN  OUT  IN  INTEG^^R  OR  E  FORMAT 
FOR  INTEGER  AND  FLOATING  ARITHMETIC  INSTRUCTIONS  RF SPECT I VELY. 

TRACING  AND  TRAPPING  ON  NO-OPS 

A  USER  WILL  OFTEN  WISH  A  COMPLi^TE  OPCOr^E  TRACE  (ALL  OPCODES)  OP 
A  PARTICULAR  PROCESSOR,  BUT  NOT  WANT  A  TRACE  OF  EV^RY  N'^-OP  ♦  WHICH  WILL 
GENERALLY  CONSTITUTE  A  CONSIDERABLF  FRACTION  OF  THE  OUTPUT,  BUT  ADD  NO 
USEFUL  INFORMATION.   TO  AVOID  HAVING  TO  TURN  OFF  TRACES  ON  OPCODE  ^6 
(NO-OP)  EVERY  TIME.  AN  OPTION  HAS  BEEN  INCLUDED  WICH  WILL  PERMANENTLY 
SUPPRESS  TRACING  OF  AND  TRAPPING  ON  NO-OPS.   IT  IS  CALL-D  ■=  IMPLY  BY 

CALL  NONOOP 
SUCCESSIVE  CALLS  TO  NONOOP  WILL  TURN  THE  OPTION  ALTERNATELY  OFF  AND  ON. 

SUSDENDING  TRAPS  AND  TRACES 

IN  ORDER  TO  SUSPEND  ALL  TRAPS  AND  TRACES  OF  A  PARTICULAR  TYPE 
TEMPORARILY  WITHOUT  TURNING  ALL  THE  TRAPS  AND  TRACES  OFF,  AND  HAVING  TO 
TURN  THEM  INDIVIDUALLY  ON  AGAIN.  TWO  OPTIONS  HAVE  BEEN  dROVIDED  IN 
SOAPSUDS.  SUSPEND  AND  RFSUMP, 

ALL  TRAPS  AND  TRACE':  OF  A  CTUfN  TYPE  may  BE  SUSPENDED  BY 

CALL   SUSPENO( TYPE) 

WHERE   TYPE  IS  THE  TYPE  OF  TRAPPING  AND  TRACING  TO  BE  S^'SPNEDED 

=1  FOR  OPCODE  TRACES  AND  TRAPS 

=2  FOR  CONTROL  TRACES  AND  TRAPS 

=3  FOR  LOAD  TRACES  AND  TRAPS 

=4  FOR  STORE  TRACES  AND  TRAPS 
THE  CORRESPONDING  CALL  TO  TURN  BACK  ON  TRACES  AND  TRA^S  OF  A  SPECIFIED 
TYPE  AFTER  THEY  HAV^  BEEN  SUSPENDED  IS 

CALL   RF<:;UME(TYDF) 

WHERE   TYPE  HAS  ThE  SAME  VALUE  AS  IT  DID  IN  TURNING  OFF  THE  TRAPS 
AND  TRACES  OF  THAT  TYPE. 

IF  NO  TRAPS  OR  TRACES  OF  A  GIVEN  TYPE  ARE  SET  WHEN  A  ^^USPEND  IS 
EXECUTED,  THE  SUSPEND  WILL  HAVE  NO  EFFECT.   HOWEVER,  EX-CU^ING  A  RE<^UME 
FOR  A  TYPE  OF  TRAP  FOR  WHICH  NO  TRAPS  OR  TRACES  ARE  SET  WILL  MAKE  THE 
TRAPPING  ROUTINE  BELIEVE  THAT  IT  ACTUALLY  HA<:  SOME  TRAPS  SET,  AND  THUS 
CAN  CAUSE  CONFUSION 
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LOCATION  CHFCKING  OPTIONS 

AS  WAS  MFNTIONFD  ABOVE*  "SOAPSUDS  HAS  TWO  OPTIONS  FOR  THE  CONSTANT 
^^ONITOPING  OF  LOCATIONS*  CHECK  AND  CORRECT.     BOTH  0DTI0N=»  ONCE  TURNED 
ON.  RECHFCK  THE  VALUE  0^  THE  SPECIFIED  WORnS  WHENEVER  A  STORE  TO  THOSE 
LOCATION^:  IS  MAOF, 

THF  CHrrx  OPTION 
THE  CHECK  OPTION  AUTOMATICALLY  CHECKS  WHETHER  A  SPECIFIED  LOCATION 
3EARS  A  SPECIFIED  RELATION  TO  OTHER  SPECIFIED  LOCATION(S),   IF  THE  RELAJION 
IS  NOT  MET,  TWO  COURSE^  OF  ACTION  ARE  AVAIABLE,   EITHER  A  MESSAGE  CAN" BE" 
PRINTED  OUT*  OR  A  TRAP  CAN  BE  MADE  TO  A  SPECIFIED  LOCATION.   UNLIKE  A 
STORE  TRAP,  A  CHECK  TRAP  IS  MADE  AFTER  THE  STORE  OCCURS. 
TO  TURN  ON  A  CHC^CK  , 

CALL   CHECK { LOCN ,NAME ,REL fTRAPADR) 

WHER^   LOCN     IS  THE  ADDRESS  WHICH  IS  TO  BF  CHECKED 

NAME     IS  THE  ADDRESS  OF  A  WORD  IN  WHOSE  LEFT  48  BITS  IS  THE  NAME 
OF  THIS  LOCATION*  IN  DISPLAY  CODE.   THES!^  EIGHT  CHARACTERS 
WILL  BE  PRINTED  OUT  WHENEVER  A  CHECK  MESSAGE  FOR  THIS 
LOCATION  APPEARS.    THE  WORDS  AT  NAME+1  AND  _NAME+2  ARE 
USf^D  IN  CCMDARISONS  AGAIN-^T  THE  CONTENTS  OF  LOCN 
PEL      SP'^CIFIFS  THE  RELATION  TO  BE  TESTED  FOR 
The  RELATION  IS 

(LOCN). EO. (NAME+1) 
(LOCN). N^. (NAME+1) 
(LOCN). GE. (NAME+1) 
(LOCN).LT. (NAME+1) 
(L0CN).GE.(NAME+1) 

(L0CN).LT.(NAME+2)  I.E.*  'BETWEEN* 
(LOCN). L". (NAME+1)  OR 
(LOCN). GE. (NAME+1) *  I.E..  'NOT  BETWEEN* 
FOR  REL=ANYTHING  ELSE,  AN  ERROR  MESSAGE  WHEN  CHECK  IS  CALLED 
TRADADR  IS  THE  ADDRESS  TO  WHICH  THE  TRAP  IS  MADE  IF  THE  RELATION 
IS  NOT  SATISFIED.   AS  WITH  O^^HER  TRAPS,  A  RJ   TRAPADR 
jc^  DppFORMED  IN  ORDE"  TO  TRAP.   HENCE  THE  FIRST  EXECUTABLE 
STATEMENT  OF  THE  TRAP  ROUTINE  SHOULD  BE  AT  TRAPADR+1^, 
EXIT  FROM  THE  TRAP  ROUTINE  IS  THE  SAME  AS  EXIT  FROW  OTHER 
TYDES  OF  TRAPS  (SEE  ABOVE).   IN  THIS  CASE*  THE  EXIT  TO 
THF  LOCATION  AFTER  THE  ONE  SPECIFIED  IN  THE  JUMP  STORED 
AT  TRAPADR  PRODUCES  A  'CHECK  MESSAGE.'   IF  TRAPADR.  IS 
ZERO  NO  TRAP  WILL  OCCUR*  BUT  THE  'CHECK  MESSAGE'  WILL  BE 
PPTNTf^O. 
THE  FOPMAT  OF  TH^  'CHECK  MESSAGE'  IS 

PROCESSOR  2^  AT  P=n0l726  CHECKED  VALUE  OF  LOCNAME  (007555)  AND  FOUND  THE 
ILL'-'GAL  VALUP  00n0000OOC0n00O00Ol5B  (  13) 

THE  COPPFCT  OPTION 

THE  CORRECT  OPTION  AUTOMATICALLY  CHECKS  WHETHER  THF  CONTENTS  OF  A 
SPECIFIFD  LOCATION  IS  EQUAL  TO  THE  CONTENTS  OF  ANOTHER  SPECIFIED  LOCATION. 
WHENEVER  A  STORE  IS  MADE  TO  TH3  FIRST  LOCATION.   IF  THE  CONTENTS  OF  THE 
LOCATION  BEING  MONITORED  IS  S(£T  TO  A  VALUE  DIFFERENT  FROM  THE  CONTENTS  OF 
THE  OTHER  WORD  (THE  'CORRECT  VALUE'),  A  MESSAGE  WILL  BE  PRINTED  AND  THE 
LOCATION  BFIfiG  MONITORED  IS  RESET  TO  THE  CORRECT  VALUE. 
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FOR 

REL=1* 

TRUE 

Fnp 

PEL=2* 

TRUE 

FOR 

PEL=3* 

TRUE 

FOP 

REL=6* 

TRUE 

FOR 

REL=5* 

TRUE 

AND 

FOR 

RFL=6* 

TRUE 

IF 

THE  CALLING  SEQUENCE  IS  THE  SAME  AS  FOR  CHECK.  EXC-PT  B3  (RED  MUST  BE 
NEGATIVE 

CALL   CHFC<(L0CN,NAME,-1) 

WHERE   LOCN     IS  THE  ADDRFS*^  WHICH  IS  TO  «■="  CHECK<^0 

NAME     IS  THE  ADDRESS  OF  A  WORD  IN  WHOSE  LEFT  A8  BITS  IS  THF  NAME 
OF  THIS  LOCATION*  IN  DI-^PLAY  CODE.   THESE  EIGHT  CHARACTERS 
WILL  BE  PRINTED  OUT  WHENEVER  A  CORRi^CT  M-SSAGE  FOR  ^HIS 
LOCATION  APPEARS.   THE  CONTENTS  OF  NAME+1  1=  TA<EN  AS  THE 
•CORRECT  VALUE'  WHICH  IS  COMPARED  AGAINST  LOCN  WHENEVER  A 
STORE  IS  MADE  TO  LOCN. 
THF  FORMAT  OF  THF  CORRECT  MESSAGE  IS 

PROCESSOR  23  AT  P=001726  CORRECTED  LOCNAME  (007555)  FROV 

00000000000000000013B  (  11)  TO  OOOOOOOGOOOOOOOOGO 15B  (      13) 

THE  PROCESSOR  NUMBER  AND  ADDRESSES  ARE  IN  OCTAL.  AND  THE  CONTENTS  ARE 
WRITTEN  OUT  IN  BOTH  OCTAL  AND  INTEGER  FORMATS. 

TO  TURN  OFF  A  CHECK  OP  A  CORRECT 

TO  TURN  OFF  EITHER  A  CHECK  OR  A  CORRECT  OPTION  SET  ON  A  LOCATION. 

CALL   CHFCKOF(LOCN) 

WHERE   LOCN     IS  THE  ADDRESS  THE  CHECK  OR  CORRECT  OPTION  ON  WHICH  IS  TO 
R"^  TllRNf^D  nrF 

ADDITIONAL  NOT^S 

1.  IF  SEVERAL  CHECKS  AND/OR  CORRECTS  ARE  SET  ON  ON"^  LOCATION.  ONLY 
THE  FIRST  ONF  SET  WILL  BE  EXECUTED. 

2.  IF  SEVERAL  CHECKS  AND/OR  CORRECTS  ARE  SET  ON  ON^  LOCATION.  CHECKOF 
WILL  TURN  OFF  THE  ONE  WHICH  WAS  TURNED  ON  "^IRST, 

3.  A  MAXIMUM  OF  50  CHECKS  AND  CORRECTS  MAY  BE  SET.   WHEN  50  HAVE  BEEN 
SET.  FURTHER  CALL*^  TO  CHECK  WILL  P"^  IGNORED. 

4.  THE  CHECK  AND  CORRECT  OPTIONS  MAY  BF  SUSPENDED  AND  RESUMED  JUST  LIKE 
THE  TRAP  OPTIONS.  WITH  CALLS  TO  SUSPEND  AND  TO  RESUME.   TYDE=5  FOR 

CALLS  TO  SUSPEND  AND  RESUME. 
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SUMMARY  OF  TRAP.  TRACZ.  AND  LOCATION  CHECKING  OPTIONS 

THE  FOLLOWING  TABLE  SUMMARIZES  THE  ARGUMENTS  OF  CALLS  TO  THE  TRAPPING. 
TRACING,  AND  LOCATION  CHECICING  OPTIONS.   FOR  DETAILS  SE^  THE  SPECIFIC 
DISCUSSIONS  OF  TRAPPING.  TRACING.  AND  LOCATION  CHECKING  ABOVE. 


OPTION 

PARI* 

TO 

TURN 

ON  OPCODE  TRAP 

1 

TO 

TURN 

ON  CONTROL  TRAP 

2 

TO 

TURN 

ON  LOAD  TRAP 

3 

TO 

TURN 

ON  STORE  TRAP 

4 

TO 

TURN 

ON  OPCODE  TRACE 

1 

TO 

TURN 

ON  CONTROL  TRACE 

2 

TO 

TURN 

ON  LOAD  TRACE 

3 

TO 

TURN 

ON  STORE  TRACE 

4 

TO 

TURN 

ON  A  CHECK 

LOCN 

TO 

TURN 

ON  A  CORRECT 

LOCN 

TO 

TLTRN 

OFF  OPCODE  TRAP 

1 

TO 

TURN 

OFF  CONTROL  TRAP 

2 

TO 

TURN 

OFF  LOAD  TRAP 

3 

TO 

TURN 

OFF  STORE  TRAP 

4 

TO 

TURN 

OFF  OPCODE  TRACE 

1 

TO 

TURN 

OFF  CONTROL  TKACE 

2 

TO 

TURN 

OFF  LOAD  TRAP 

3 

TO 

TURN 

OFF  STORE  TRACE 

4 

TO 

TURN 

OFF  A  CHECK 

LOCN 

TO 

TURN 

OFF  A  CORRECT 

LOCN 

TO 

SUSPEND  OPCODE 

TRAPS  AND  TRACES 

1 

TO 

SUSPEND  CONTROL 

TRAPS  AND  TRACES 

2 

TO 

SUSPEND  LOAD 

TRAPS  AND  TRACES 

3 

TO 

*;usoi 

-ND  STORE 

TRAPS  AND  TRACES 

4 

TO 

SUSPEND  CHECKS 

AND  CORRECTS 

5 

TO 

PPSUMF  oprnoF 

TRAPS  AND  TRACES 

1 

TO 

RESUME  CONTROL 

TRAPS  AND  TRACES 

2 

TO 

RESUME  LOAD 

TRAPS  AND  TRACES 

3 

TO 

RESUME  STORE 

TRAPS  AND  TRACES 

4 

TO 

RESUME  CHECKS 

AND  CORRECTS 

5 

PAR2=  PAR3= 

OPCODE*  TRAPADR 

ADR*  TRAPADR 

ADR*  TRAPADR 

ADR*  TRAPADR 

OPCODE*  NAME 

ADR*  NAME 

ADR*  NAME 

ADR*  NAME 


NAME 
NAME 

OPCODE^ 
ADR* 
ADR* 
ADR* 

OPCODE^ 
ADR* 
ADR* 
ADR* 


REL 
-1 


PAR4  = 

CALL 

CPU* 

TRAP 

- 

TRAP 

- 

TRAP 

- 

TRAP 

CPU* 

TRACE 

- 

TRACE 

- 

TRACE 

- 

TRACE 

TRAPADR 

CHECK 

- 

CHECK 

CPU* 

TRAPOFF 

- 

TRAPOFF 

- 

TRAPOFF 

- 

TRAPOFF 

CPU* 

TRAPOFF 

- 

TRAPOFF 

- 

TRAPOFF 

- 

TRAPOFF 

_ 

CHECKOF 

- 

CHECKOF 

- 

SUSPEND 

- 

SUSPEND 

- 

SUSPEND 

- 

SUSPEND 

- 

SUSPEND 

- 

RESUME 

- 

RESUME 

- 

RESUME 

- 

RESUME 

_ 

RESUME 

*  THE  VALUE  -64  MAY  BE  USED  TO  INDICATE  'ALL* 

A  DASH  (-)  INDICATES  THE  PARAMETER  IS  IGNORED  BY  THE  ROUTINE. 

THE  OTHER  AtttjREVi  AT  I ONS  ARE  EXPLAINED  IN  THE  DETAILED  DESCRIPTIONS. 
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TIMING  OPTIOMS 

THE  SIMULATION  OF  INSTRUCTIONS  BY  SOAPSUDS  MA<ES  THE  ASSUMPTION  THAT 
ALL  INSTRUCTIONS.  INCLUDING  NO-OPS.  REQUIRE  THE  SAME  EXECUTION  TIME,   THIS 
EXECUTION  TIME  IS  TERMED  ONE  CYCLE, 

SOAPSUDS  SIMULATES  ONE  INSTRUCTION  FOR  EACH  PROCESSOR  IN  ONE  SIMULATED 
CYCLE.   THUS.  IF  THERE  ARE  THREE  PROCESSORS  RUNNING.  SOAPSUDS  WILL  SIMULATE 
THE  FIRST  INSTRUCTION  FOR  PROCESSOR  0.  THEN  THE  FIRST  INSTRUCTION  FOR 
PROCESSOR  1.  THEN  THE  FIRST  INSTRUCTION  FOR  PROCES-^OR  2  (END  OF  FIRST  CYCLE), 
THEN  THE  SECOND  INSTRUCTION  FOR  PROCESSOR  0.  ETC, 

CYCLP  TpACP 

A  COUNT  OF  THE  PRESENT  CYCLE  NUMBER  IS  KEPT  IN  SOAPSUDS  (FIRST  CYCLE« 
CYCLE  0),   THE  USER  MAY  OPTIONALLY  PRINT  OUT  AT  THE  END  OF  EACH  CYCLE  A 
MFSSAHF  OF  THE  FOPM 

CYCLt^  1239  JUST  COMPLETED 

WHERE  THE  NUMBER  OF  THF  CYCLE  IS  IN  DECIMAL. 
THIS  OPTION  IS  TURNED  ON  BY 

CALL  CYCLTPC  (1) 

AND  TURNED  OFF  BY 

CALL  rYCLTRC  (0) 


CYCLE  TRAPS 

THE  USER  MAY  CAUSE  A  TRAP  TO  BE  EXECUTED  AT  A  PARTICULAR  CYCLE  COUNT* 

THEREBY  ENABLING  HIM.  FOR  EXAMPLE.  TO  SET  A  TRACE  JUST  BEFORE  THE  PROGRAM 
ENCOUNTERS  DIFFICULTIES, 

TO  TRAP  ON  A  CYCLE  COUNT. 

CALL  CYCTRP  (CYCLE. LOCN) 

WHERE  CYCLE  IS  THE  CYCLE  COUNT  TO  BE  TRAPPED  ON  AND  LOCN  IS  THE  LOCATION 
TO  WHICH  THF  TRAP  IS  MADE, 

THUS.  TO  TRACE  ALL  THE  INSTRUCTIONS  EXECUTED  BETWEEN  CYCLES  100000 
AND  ll'^OOO.  ONE  WOULD  DO  THE  FOLLOWING 

e-XTERNAL  CN.CF 

CALL  CYCTRP  ( 100000, CN) 

CALL  CYCTRP  ( 110000. CF) 

SUBROUTINE  CN 

CALL  TRACE  ( 1 .-64 .0.-64 ) 

RETURN 

END 

SUBROUTINE  CF 

CALL  TRAPOFF  (1,-64,0.-64) 

RETURN 

END 
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TIME  STATUSES 

IN  ADDITION  TO  KEEPING  A  COUNT  OF  THE  PRESENT  CYCL"^  NUMBER.  SOAPSUDS 
<E^PS  POUR  COUNTS  FOR  EACH  OF  THF  PROCFSSORS — SYSTEM  TI^E, PROGRAM 
TIME. IDLE  TIME*  AnC  TRACING  TIME. ALL  OF  WHICH  ARE  GiVEN  IN  CYCLES. 
NORMALLY  THE  PUNNING  TIME  IS  INCREMENTED  EACH  CYCLE  SO  LONG  AS  THE  PROC- 
ESSOR IS  RUNNING.  AND  THE  IDLE  TIME  IS  INCREMENTED  EACH  CYCLE  ONCE  THE  PROCESSOR 
PP0CFS505  MA<:  cTODOtro, 

Rv  ii<;f  of  THE  "^YSTEM^..  U'^FFtjL.  USELES^.  AND  TRACIN""  OPTIONS 
THE  USFP  CAN  OVFRPIDE  THIS  CONVFNTION  AND  HAVE  ANY  OF  THE  TIME 
COUNTERS  INCREMENTfc'D.  REGARDLESS  OF  THE  STATE  OF  THE  PROCESSOR. 
ThF<;f  OPTIONS   ARF  INTFNDED  TO  GIVE  USERS  THE  POSSIBILITY  OF  DISC- 
RIMINATING RFTWEEN  TIMF  SPENT  IN  CONTROLLING  THE  THE  0V=:RALL  FLOW 
OF  PROGRAM  FXFCUTION  .  IN  THE  ACTUAL  EXECUTION  OF  PROGRAMS.  IN 
NONPRODUCTIVE  ACTIVITY  AN^  IN  THE  PERFORMANCE  OF  MISCELLANEOUS 
TA^^kTS  PFSPFTTIV^LY, 

^HF  STAU'^F'^  aR"^  ^^^t  a«^  F0LL0W<^ — 

SYSTEM   TIVE  STATUS       CALL  SYSTEM"^ 
PROGRAM  TIME  STATUS       CALL  USEFUL 
IDLF     TI-^E  STATUS       CALL  USELES<: 
TRACING  TIME  STATUS       CALL  TRACING 

THE^E  OPTION";  IN  NO  WAY  AFFECT  SIMULATION.  REPEATED  CALLS  TO  THEM 
WILL  NOT  CAUSE  ANY  HARM, 


SOAPSNAD 

WMFN  ALL  PROCESSORS  HAVE  STOPPED.  AT  Twe  END  OF  SIMULATION.  SOAPSUDS 
PRINTS  OUT  A  •SOAPSNAP'.  WHICH  GIVES  THE  PRESENT  STATUS  OF  ALL  PROCESSORS 
(IN  THIS  CASF .  ALL  STOPPED)  AND  THE  RUNNING  AND  IDLE  TIMES  OF  ALL  THE 
PROCESSORS.   SUCH  A  SOAPSNAP  MAY  OPTIONALLY  BE  PRINTED  GUT  AT  ANY  TIME 
DURING  SIMULATION,  IF,  FOR  EXAMPLE,  THE  TIME  REQUIRED  FOR  INDIVIDUAL  ROUTINES 
IS  BEING  MFACUPFD, 

TO  SOAPSNAP  WHILE  IN  A  TRAP  ROUTINE  OR  UNDER  SIMULATION. 

CiLL     SOfiP-^NP 
OR,  FOP  EXAMPLE.  TO  SOAPSNAP  ON  EXITING  FROM  A  SUBROUTINE  CALLED  SPS. 
CALL      TRAP  (?  .SPS  ,<;oAPc,NP  ) 


THE  USER  MAY  PRINT  OUT  A  SNAP  OF  THE  CURRENT  REGISTER  INFORMATION 
OF  ANY  CPU  BY  A  CALL  TO  SNAP.   IN  ADDITION  TO  THE  rONTENTS  OF  THE  X.A 
AND  B  REGISTERS.  THF  CURRENT  P  REGISTER.  INSTRUCTION.  STATUS  (RUNNING 
OR  STOPPED),  MODE  (PROGRAM  OR  RESIDENT).  CYCLE  COUNT  AND  THE  TOTAL  TIME 
SPENT  IN  EACH  OF  THf  TIME  STATUSES  ARE  PRINTED  OUT. 

SOAPSUD'  ALWAYS  PRINTS  A  SNAP  ON  THE  DETECTION  OF  AN  ERROR  CONDI- 
TION AFTER  THE  PRINTOUT  DESCRIBING  THE  CONDITION.   TO  SNAP. 

CALL     '^NAP 
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ABORT  OPTION 
IF  THE  USER  HAS  ERROR-CHECKING  ROUTINES  IN  THE  PROGRAM  BEING  SIMULATED* 
AND  HE  DETECTS  AN  ERROR*  HE  MAY  WISH  TO  ABORT  HIS  PROGRAM  RATHER  THAN  END 
IT  (I.E.,  CALL  EXIT),   SOAPSUDS  HAS  AN  OPTION  WHICH  WILL  PRINT  OUT  THE 

%%%%%    EXECUTION  TERMINATED  BY  ABORT  REQUESTED  BY  PROGRAM  (DURING 
SIMULATION  OF  CPU  23.  P=007327)  DURING  CYCLE  1324 

AND  THEN  ABORT  THE  PROGRAM.   THE  CPU  NUMBER  AND  P  REGISTER  ARE  PRINTED 
IN  OCTAL.  AND  THE  CYCLF  COUNT  IN  DECIMAL.  AS  USUAL. 

TO  ABORT.  FOR  EXAMPLE.  IF  CONTROL  PASSES  TO  LOCATION  ILLEGAL.  PUT  AT 
LOCATION  ILL«^r,AL 

CALL     ABORT 


CALL  TRAP  (2. ILLEGAL. ABORT. 0) 


SUMMARY  OP  MISCELLANEOUS  SOAPSUDS  FEATURE": 

THE  FOLLOWING  TABLE  SUMMARIZES  THE  ARGUMENTS  OF  CALLS  TO  VARIOUS 
FUNCTIONS  OF  THF  -SOAPSUDS  SYSTEM.  AGAIN*  FOR  DETAILS  SEE  THE  SPECIFIC 
DISCUSSIONS  OF  THF  FUNCTIONS  IN  THF  TEXT, 


FUNCTION 


PARI 


PAR2 


PAR3 


CALL 


TO  PRINT  OUT  CYCLE  TRACE 
TO  TURN  OFF  CYCLE  TRACE 


CYCLTRC 
CYCLTRC 


TO  TURN  ON  A  CYCL^  TRAP 


CYCLE 


LOCN 


CYCTRP 


TO  START  IDLE  TIME  INCR 

BY  SETTING  TRAP*  TYPE  ADR 
TO  START  RUNNING  TIME  INCR 

BY  SETTING  TRAP*  TYPE  ADR 
TO  START  SYSTEMS  TIME  INCR 

BY  SETTING  TRAP*  TYPE  ADR 
TO  START  TRACING  TIME  INCR 

BY  SETTING  TRAP*   TYPE        ADR 


UStLESS 

(CPU) 

USEFUL 

(CPU) 

SYSTEMS 

(CPU) 

TRACING 

(CPU) 

USELESS 
TRAP 
USEFUL 
TRAP 

SYSTEMS 
TRAP 
TRACING 
TRAP 


TO  SOAPSNAP  nuRING  TRAP 

OR  SIMULATION 

BY  SETTING  TRAP*   TYPE        ADR 
TO  SNAP  DURING  TRAP 

OR  SIMULATION 

BY  SETTING  TRAP*   TYPE        ADR 


SOAPSNP    (CPU) 


SNAP 


(CPU) 


SOAPSNP 
TRAP 


SNAP 
TRAP 


TO  ABORT  DURING  TRAD 

OR  SIMULATION 
BY  SETTING  TRAP* 


TYPE 


ADR 


ABORT 


(CPU) 


ABORT 
TRAP 


*   THESE  ENTRIES  ARF  MERELY  TO  INDICATE  THAT.  SINCE  THESE  SUBROUTINES  REQUIRE 
NO  ARGUMENTS,  THEY  MAY  BE  CALLED  DIRECTLY  UPON  TRAPPING  WITHOUT  THE  USE  OF 
AN  INTERMEDIATE  TRAP  ROUTINE  TO  CALL  THE  OPTION.   THEY  MAY  BE  CALLED  BY  ANY 
OF  THE  FOUR  STANDARD  TYPES  OF  TRAPS  (FOR  A  DESCRIPTION  OF  WHICH.  SEE  'USING 
THE  TRAPPING  AND  TRACING  SYSTEM*.  ABOVE).   THEY  MAY  ALSO  BE  CALLED  FROM  CHECK 
TRAPS  (DESCRIBED  UNDER  'LOCATION  CHECKING  OPTIONS* •  ABOVE). 
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APDFNDIX  T 
PLANS  FOP  THE"  FUTIJ7E 

TT  SHOULD  BE  CLEAR  FROM  THE  DESCRIPTION  OF  THE  SVSTEM  GIVEN  IN 
THE  TEXT  THAT  SOAPSUDS  WILL  BE  A  USEFUL  AND  FLEXIBLE  TOOL  FOR  EXPERI- 
MENTATION WITH  PARALLEL  PROCESSING  PROCEDURFS.  THERE  ARF»  HOWEVER. 
SEVERAL  ASPECTS  OF  SUCH  PROCEDURES  THAT  ARE  NOT  BROUGHT  OUT  CLEARLY 
ENOUGH  BY  THE  PRESENT  SYSTEM.  THESE  INCLUDE  A  MEAN^-  OF  RECORDING  THE 
5IMULATANE0US  USAGE  OF  FUNCTIONAL  HARDWARE  UNITS  AND  '"AST  REGISTERS. 
MEMORY  CONFLICTS.  AND  AN  ACCURATE  ESTIMATE  OF  THE  IDLIN'^  TIME  VS. 
RUNNING  TIME  RELATIONSHIP,  IN  ADDITION.  CERTAIN  FEATURES  SHOULD  BE  ADDED 
TO  THF  SOAPSUDS  REPERTOIRE  TO  ENABLE  A  STUDY  OF  THE  FEASIBILITY  AND 
DESIRABILITY  OF  VARIOUS  POSSIBLE  SYSTEM  CON- IGURAT I ONS,  SHOULD  ONE.  FOR 
EXAMPLE.  SELECT  A  PARTICULAR  rPU  AS  AN  EXECUTIVE  CONTROLLING  THE 
OPERATION  OF  THE  OTHER  PROCESSORS.  SHOULD  THIS  SELECTION  BE  FLEXIBLE  TO 
ALLOW  SHARING  OF  SYSTEM  FUNCTIONS  AMONG  SEVERAL  CPU'S  AS  REQUIRED. 
SHOULD  THE  SYSTEM  BE  SET  UP  SO  THAT  AS  MUCH  AS  POSSIBLE  EACH  CPU  TAKES 
CARE  OF  ITS  OWN  SYSTEM  FUNCTIONS.  FOR  THESE  REASON^.  AN  AUGMENTED 
VERSION  OF  SOAPSUDS  IS  BEING  DESIGNED  SOME  CHARACTERISTICS  OF  WHICH 
WILL  NOW  BE  OUTLINED. 

I  TT  IS  INTUITIVELY  CLEAR  THAT  A  PARALLEL  PROCESSOR  WILL  NOT  REQUIRE 
A  FULL  SET  OF  FUNCTIONAL  UNITS  (I.E.  THE  HARDWARE  FOR  MULTIPLICATION. 
DIVISION.  ETC.)  FOR  EACH  CPU.  IN  ORDER  TO  GET  A  CONCRETE  PICUTRE  OF  THE 
AMOUNT  OF  SHARING  THAT  CAN  EFFICIENTLY  BE  TOLERATED.  A  FUNCTIONAL  UNIT 
USE  ANALYZER  WILL  BE  INCORPORATED  INTO  THE  SYSTEM  ALONG  WITH  THE  ABILITY 
TO  VARY  THE  NUMBER  OF  FUNCTIONAL  UNITS  TO  BE  ALLOWED  (AND  A  DETERMINA- 
TION OF  THE  LENGTHS  OF  ANY  RESULTING  QUEUES). 

II  THE  AUGMENTED  SYSTEM  WILL  DISPLAY  THE  AMOUNT  OF  TIME  LOSS  CAUSED  BY 
CONFLICTS  IN  MEMORY  REFERENCING.  THIS  FEATURE  WILL  BE  SO  WRITTEN  AS  TO 
ALLOW  VARIATIONS  IN  THE  DEFINITION  OF  •  "MEMORY  CONFLICT." 

III  IT  WOULD  BE  DESIRABLE  TO  MEASURE  THE  USEFULNES*^  OF  HAVING  A  POOL 
OF  FAST  REGISTERS  THAT  ALL  THE  CPU'S  COULD  ORAW  FROM  RATHER  THAN  A 
FIXED  CONTINGENT  FOR  EACH  CPU.  A  SCHEME  FOR  ACCOMPLISHING  A  REASONABLE 
ESTIMATE  FOR  THIS  WILL  BE  ADDED  TO  THE  AUGMi^NTED  SYST. 

IV  SOAPSUDS  CURRENTLY  DOES  ONLY  A  ROUGH  ESTIMATE  OF  THE  COMPUTER  TIME 
USED  BY  THE  CPU'S.  THE  AUGMENTED  VERSION  WILL  MORE  TRULY  REFLECT  THIS 
AND  ALLOW  REDEFINITION  OF  INSTRUCTION  TIMES  FOR  EXPERIMENTATION. 

V  IN  ORDER  TO  FACILITATE  EXECUTIVE  CONTROL  OF  THE  OPERATIONS  OF  THE 
SYSTEM.  AN  EXCHANGE  JUMP*  INSTRUCTION  WILL  BE  INCORPORATED  IN  THE  NEW 

SYSTCfW, 

VI  FINALLY.  THE  STORAGE  PROTECTION  FEATURES  OF  SOAPSUDS  WHICH  ARE 
QUITE  LIMITED  AT  PRESENT  WILL  BE  AUGMENTED  BY  A  MORE  FLEXIBLE  SET  UP. 

»CF.  5600  REF.  MANUAL  3-9 
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APPFNOIY  II 
ATHt^NA,  AN  OPFRATING  SYSTEM  FOR  SOAPSUDS 


(E.  DRAUGHON  SYSTEM) 


AS  PART  OF  THE  EXPERIMENTATION  WITH  SOAPSUDS*   PLANS  HAVE  BEEN  MADE 
FOR  A  POSSIBLE  OPERATING  SV^TEM  FOR  A  PARALLEL  MULTIPROCESSOR  CONFIGURATION 
LKF  THAT  STMULAT^D  BY  SOAPSUDS.   TESTING  OF  THI^  OPERATING  SYSTEM  UNDER 
AN  EXTENDED  VERSION  OF  SOAPSUDS  WILL  YEILD  DATA  ON  THE  "FEASIBILITY  AND 
EFFICIENCY  OF  VARIOUS  TECHNIQUES  REQUIRED  FOR  THE  ORGANIZATION  AND  CONTROL 
OF  A  MULTIPROCESSOR  SYSTEM. 

THE  OPERATlNr-  SYSTEM  DESCRIBED  BELOW.  ATHENA.  USES  THE  PRINCIPLE  OF 
TASKING  IN  ORDER  TO  SUBDIVIDE  THE  JOBS  BEINC.  RUN  TO  PERMIT  PARALLEL 
PROCESSING.   A  TA^k  is  THE  EXECUTION  OF  A  PARTICULAR  SECTION  OF  CODE  BY 
ONE  PROCESSOR.   NOTE  THAT  THERE  IS  NOT  A  ONE-TO-ONE  CORRESPONDENCE  OF  TASKS 
AND  SECTIONS  OF  COD^ — ANY  NUMBER  OF  TASKS  MAY  ALL  REFER  TO  THE  SAME  SECTION 
OF  THE  PROGRAM.   IN  A  VERY  LARGE  DO  LOOP.  FOR  EXAMPLE.  FACH  PASS  THROUGH 
THE  DO  LOOP  (I.E..  EACH  VALUE  OF  THE  ITERATION  VARIABLE)  MAY  BE  MADE  INTO 
A  croARATE  TASK.   THE  BASIC  FACT  ABOUT  A  TA^K  IS  THAT,  ONCE  IT  BEGINS 
EXECUTION.  IT  DOE<^  NOT  WAIT  FOR  SYSTEM  APPROVAL  TO  EXEC'TE  ANY  SUBSECTION 
OF  THE  TASK.   THAT  IS.  AS  FAR  AS  THE  SYSTEM  IS  CONCERNED.  IT  INITIATES 
EXECUTION  OF  THE  TASK  ONLY  WHEN  IT  IS  ALL  RIGHT  FOR  THE  TASK  TO  EXECUTE  TO 
COMPLETION.   THE  TASKS  BELONGING  TO  A  PARTICULAR  JOB  CAN.  OF  COURSE.  STILL 
COMMUNICATE  WITH  EACH  OTHER  THROUGH  STORAGE  FLAGS.  SO  THAT  ONE  TASK  MAY 
PAUSE  IN  THE  MIDDLE  OF  ITS  CODE  WAITING  FOR  A  PARTICULAR  FLAG  TO  BE  SET  IN 
MEMORY.   HOWEVER.  UNLESS  THE  EXPECTED  WAIT  IS  VERY  SHORT.  SUCH  PROCEDURES 
DEFEAT  THE  PRUPOSE  OF  A  TASKING  SYSTEM,  SINCE  THE  IDEA  IS  TO  EXECUTE  FIRST. 
AT  ANY  GIVEN  TIME.  THO^E  SECTIONS  OF  CODE  WHICH  MAY  BE  PROCESSED  TO  COMPLETION 
WITHOUT  PAUSE.   ThIS  MINIMIZES  THE  TIME  THAT  THE  CENTRAL  PROCESSORS  ARE 
WAITING  IDLY  IN  TASKS.  AND  THUS  PRESUMABLY  MAXIMIZES  THE  COMPUTING  POWER 
OF  THP  MACHTNF, 

WHETHER  THIS  SYSTEM  IS  IN  FACT  THE  MOST  EFFICIENT  WILL  DEPEND  LARGELY 
ON  THE  PROPROTION  OF  TIME  REQUIRED  FOR  SYSTEM  OVERHEAD.     AFTER  THE 
COMPLETION  OF  EACH  TASK,  THE  PROCESSOR  RETURNS  TO  A  SYSTEM  ROUTINE  KNOWN  AS 
THE  RESIDENT.   IT  IS  THE  JOB  OF  THE  RESIDENT  TO  DETERMINE  WHAT  TASK  »  IF 
ANY,  I?  READY  TO  BE  EXI^^CUTED.   IF  IT  FINDS  ONE.  THF  PROCESSOR  TRANSFERS 
CONTROL  TO  THIS  TASK.   IN  ADDITION  TO  THE  RESIDENT  EXECUTED  BY  EVERY  PROCESSOR* 
AT  LEAST  ONE  PROCFSSOR  MUST  BE  SFT  ASIDE  TO  COORDINATE  THE  ENTIRE  SYSTEM  AND 
F^EGULATF  OVERALL  JOB  THROUGHPUT, 

BELOW  IS  A  DESCRIPTION  OF  THE  SPECIFIC  ROUTINFS  IN  THE  ATHENA  SYSTEM. 
AND  THE  FORTRAN  SUBROUTINES  WHICH  HAVE  BEEN  WRITTEN  TO  MAKE  THE  SYSTEM 
EASILY  U'^ABL'F. 


ATHENA, 


t  ISTS. 

INPLIT  LIST.   LIST  OF  EACH  MAIN  PROGRAM  OR  JOB  IN  THE  MACHINE* 
JOB  LIST.   LIST  OF  ALL  THOS":^  MAIN  PROGRAMS  THAT  CURRENTLY 

HAVE  TASKS  TO  BE  PER'=^ORMED. 
TASK  LISTS.    ONE  LIST  FOR  EACH  MAIN  PROGRAM  OR  JOB. 
EACH  LIST  LISTS  ALL  THE  TASKS  WHICH  THAT  MAIN 
PROGRAM  HAS  CURRENTLY  TO  BE  PERFORMED. 


MFTHOn, 

roOoOT  NATOD, 
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1.  KEEPS  THE  JOB  LIST  UP  TO  DATE  BY  LOOKING  AT  EACH 

TASK  LIST  IN  THE  SYSTFM.   IF  THE  tasKLIST  IS  EMPTY, 
THE  JOB  IS  TAKEN  OFF  THE  JOB  LIST.  OTHERWISE  THE 
JOB  REMAINS  ON  THE  JOB  LIST  OR  IS  ADDED  TO  THE  JOB 
LI^T. 
Rc-SIDFMT. 

1.  WAITS  FOR  A  JOB  TO  APPEAR  ON  THE  JOB  LI=T. 

2.  WHEN  IT  DOES*  TH^  JOB.S  TASKLIST  IS  SEARCHED. 

IF  ANY  TASKS  REMAIN  TO  BE  DONE,  THE  PROCESSOR  EXITS 
AND  DOES  THE  TASK.   OTHERWISE,  IT  RETURNS  TO  LOOK  FOR 
THE  NEXT  JOB  IN  THE  JOBLI<=T. 

3.  WHEN  A  RELEASE  IS  MADE,  THE  PROCESSOR  EXITS  TO 
SYSTEM  STATUS,  ADDS  THE  TASK  TO  THE  tasKLIST  FOR  THIS 
JOB  AND  EXITS  BACK  TO  PROGRAM  STATUS  AND  CONTINUES 
EXECUTING  THE  PROGRAM. 

4.  WHEN  A  RETURN  IS  ENCOUNTERED,  THE  PROCE'^SOR  EXITS  TO 
SYSTEM  STATUS  AND  BEGINS  SEARCHING  THE  TASKLIST  AGAIN 

(AS  IN  2.  ABOVE). 


III.    USAGc. 


1.   TO  HANDLE  A  TASK  WITH  ARGUMENTS... 

SUPPOSE   THAT  THE  TASK  IS  CALLED  SUB  AND  ONc  WANTS  TO  START 
IT   3   TIMES  SIMULTANEOUSLY  WITH  DIFFERENT  ARGUMENTS. 

CALL  RELEASE  (14S,3,6S) 

14  GO  TO  21 

GO  TO  22 

GO  TO  23 

21  CALL    SUB(ARG1,ARG2»  ...  »ARGN) 

22  CALL    SUB( ARG1,ARG2»  ...  *ARGN> 

23  CALL    SUB(ARG1.ARG2,  ...  ,ARGN) 


CONTINUE 


2.  TO  HANDLE  PRINT  STATEMENTS  PRINTED  BY  MORE  THAN  ONE 
PROCESSOR 

CALL   SKlPPERrn) 

PRINT  5 
5    FORMAT  (  »  THIS  IS  A  PRINT  STATEMENT  WITHIN  THE  PROG 
X   RAM  TO  BE  EXECUTED  BY  ANY  NUMBER  0^  PROCESSORS.*) 

CALL    SKIPPER(I) 

3.  TO  GIVE  UP  A  PROCESSOR  TO  THE  SYSTEM.... 

CALL    RETURN 
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4.  TO  RELEASE  A  NUMBER  OF  TASKS  TO  BEGIN  EXECUTING  AS  SOON 
AS  POSSIBLE... 

CALL    RELEASE(LIST.N»CONTIN) 

(IN  THIS  CALL*  LIST  IS  THE  BEGINNING  ADDRESS  OF  A  LIST  OF  TASK 
ADDRESSES.    THt  ADDRESSES  ARE  TO  BE  RIGHT  JUSTIFIED  IN  THE  LEP T 
HALF  OF  THE  WORD.  SUCH  AS  WOULD  BE  GENERATED  BY  A  GO  TO 
STATEMENT  IN  FORTRAN.      N   IS  THE  NUMBER  OF  TASKS  BEING  RELEASED. 
CONTIN   IS  THE  STATEMENT  NUMBER  OF  THE  NEXT  EXECUTABLE  INSTRUCTION. 
AFTER  THE  CALL»  OR  THE  AOHRESS  OF  SUCH.) 


(  IF  THE  VALUE  OF   N   IS   -1.  AS  MANY  PROCESSORS  AS  ARE 
AVAILABLE  WILL  BEGIN  EXECUTING  THE  TASK.   OTHER  PROCESSORS 
WILL  JOIN  IN  EXECUTING  THE  TASK  UNTIL  A    CALL    TAKEOFF     IS 
EXECUTIED. ) 


5.   TO  PREVENT  FURTHER  PROCESSORS  FROM  ENTERING  A  TASK  AT  WILL. 
EXECUTE  A  ... 

CALL    TAKEOFF(AOD. CONTIN) 

(ADD  IS  THE  ADDRESS  OF  THE  TASK.  CONTIN  THE  NEXT  EXECUTABLE 
INSTRUCTION.) 


6.  TO  RETURN  ALL  PROCESSORS  BUT  ONE  TO  THE  SYSTEM... 

CALL    IFTHRU  ( KALL I ES.N.CONTI N ) 

(KALLIES  IS  AN  OTHERWISE  UNUSED  LOCATION.  N  IS  TGE  NUMBER 
OF  TASKS  WHICH  ARE  TO  RE  FINISHED  BEFORE  THE  NEXT  STATEMENT 
(AT  ADDRESS  CONTIN)  IS  EXECUTED.) 

7.  TO  HANDLE  PARALLEL  DO  LOOPS.... 

CALL    DO  (OK.K.INC.LIM.CONTIN.NUMB) 

(  OK  IS  THE  PUBLIC  ITERATION  VARIABLE.   K   IS  THE  PRIVATE 
ITERATION  VARIABLE.   INC   IS  THE  AMOUNT  OF  IN''REMENT. 
LIM   IS  THE  LIMITING  VALUE  OF  THE  ITERATION  VARIABLE. 
NUMB   IS  THE  NUMBER  OF  PROCESSORS  TO  BEGIN  THF  DO  LOOP. 
THE  ITERATION  VARIABLE   OK   MUST  BE  INITIALIZED  BEFORE  THE  CALL 
TO  THIS  ROUTINE.     THE  CALL   ABOVE  REPLACES  THE   DO  STATEMENT 
WHICH  IS  TO  BE  PARALLEL.) 
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